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Abstract 

A  logic-timing  simulator  is  described  for  a  hypothetical 
multiprocessor  consisting  of  CRAY-1' s  connected  to  a  common  mem¬ 
ory.  This  simulator  is  written  in  Fortran-IV  and  IBM  assembly  to 
execute  on  an  Amdahl  5860  machine ,  operating  under  the  Michigan 
Terminal  System. 

The  simulator  provides  extensive  reporting  of  individual 
CRAY-1  processor  resource  usage  and  resource  conflicts  and  inter¬ 
processor  communication.  By  calling  the  simulator  as  a  subroutine 
the  user  may  flexibly  use  program  simulation  within  a  larger  pro¬ 
blem  environment.  Extensive  interactive  debugging  features  make 
the  CRAY  multiprocessor  simulator  a  useful  tool  for  Cl)  gaining 
general  insight  into  the  design  of  multiprocessor  algorithms,  and 
(2)  the  development  of  assembly  language  programs  for  CRAY  processors 
with  instruction  sets  similar  to  the  CRAY-1. 
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Preface 

The  simulator  described  in  this  report  was  developed  to 
support  general  vector  multiprocessor  algorithm  studies.  It 
was  felt  to  be  of  sufficient  general  interest  and  utility  that 
this  documentation  was  prepared. 

The  simulator  accepts  machine  code  from  a  cross  assembler 
developed  at  the  University  of  Michigan  and  described  in  SEL 
Report  #120  and  in  the  Appendix  of  this  report. 

Both  the  cross-assembler  and  the  simulator  will  be  avail¬ 
able  from  Professor  D.  A.  Calahan  in  January,  1984. 
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1.  Introduction 

The  original  University  of  Michigan  Cray-1  (uniprocessor) 
simulator  was  written  during  1977—78  by  D.  A.  Orbits.  The  de¬ 
cision  to  build  a  simulator  was  motivated  by  the  following  consid¬ 
erations  : 

(1)  At  the  time,  access  to  a  Cray-1  for  the  purposes  of 
algorithm  design  and  code  development  was  often  very  difficult  and 
access  on  any  continuing  basis  for  research  purposes  was  not  pos¬ 
sible. 

(2)  Even  with  access  to  a  Cray-1,  it  was  often  quite  difficult 
to  analyze  algorithm  performance.  There  was  no  hardware  instrumen¬ 
tation  on  a  Cray  computer  to  permit  a  study  of  CPU  resource  usage 
and  conflict.  The  Cray-1  simulator  provided  a  detailed  report  of 
CPU  activity. 

(3)  For  algorithms  which  must  be  carefully  designed  and  coded, 
the  programmer  could  use  the  simulator  to  analyze  instruction  delays 
and  re-order  instructions  as  necessary  to  minimize  conflicts. 

(4)  When  debugging  programs,  it  was  useful  to  have  interactive 
control  of  program  execution.  Through  the  use  of  break-points ,  at- 
points  and  command  files,  the  simulator  lends  considerable  flexibil¬ 
ity  to  the  debugging  process .  (Note :  The  CTSS  operating  system  now 
provides  many  of  these  capabilities.) 

(5)  With  simulation  it  was  possible  to  study  the  impact  of  archi¬ 
tectural  modifications  on  algorithm  performance. 

A  somewhat  similar  situation  exists  with  respect  to  the  Cray 
XMP  and  other  presently  unannounced  Cray  multiprocessors.  Availabil¬ 
ity  is  currently  restricted.  Although  the  significance  of  assembly 
language  (CAL)  coding  may  be  reduced  in  future  machines,  there  is  a  new 
requirement  to  study  the  organization  and  efficiency  of  various  task¬ 
ing  strategies  on  kernels,  scientific  libraries,  and  entire  applica¬ 
tion  programs. 

The  simulator  described  in  this  report  is  intended  to  support 
such  study.  It  contains  two  major  extensions  of  the  CRAY-1  simulator 

(a)  A  number  (p,  ■  4  but  alterable)  of  CRAY-l's  are  connected  to 
the  same  common  memory.  Each  processor  has  the  instruction 
set  and  timings  of  the  CRAY-1.  This  is,  of  course,  a  hypo- 
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thetical  or  paper  machine.  Intraprocessor  but  not 
interprocessor  bank  conflicts  are  modeled. 

(b)  Hardware  semaphores  and  shared  registers  have  been 
added  to  the  CRAY-1  architecture  (see  Appendix  L)  , 
and  assembly  instructions  are  included  similar  in  format 
to  those  of  the  Cray  XMP,  to  assist  in  program  develop¬ 
ment  for  this  machine.  However,  the  timing  of  these 
instruction  executions  is  different  from  the  Cray  XMP, 
and  may  be  changed  as  we  feel  appropriate.  Thus,  the 
timings  produced  by  this  simulator  are  advisory,  visa 
vis  the  precise  timings  of  the  parent  Cray-1  simulator. 

In  this  report,  the  designation  Cray-1  will  be  used  to  denote  one 
of  the  processors  or  its  instruction  set;  the  term  Cray-M  will  de¬ 
note  the  entire  simulated  multiprocessor. 

In  summary,  this  software  can,  at  a  minimum,  yield  insight 
into  the  interplay  of  hardware  and  algorithms  by  direct  control 
from  CAL  of  the  hardware  multitasking  facilities.  Beyond  this,  it 
may  be  that  certain  high-performance  library  routines  and  other 
algorithms  requiring  complicated  tasking  and  sub-tasking  strategies 
can  be  best  implemented  with  the  simulator,  analagously  to  the  CAL  HYP AC 
linear  algebra  library  developed  by  the  Cray-1  simulator. 


2.  Simulator  Features 

This  section  of  the  user  manual  has  been  divided  into 
six  sub-sections,  each  devoted  to  a  particular  aspect  of  the 
Cray-1  simulator.  No  attempt  has  been  made  to  describe  the 
architecture  of  the  Cray-1  itself.  The  bibliography  lists 
several  sources  for  this  information. 

The  following  is  an  overview  of  the  material  covered  in  this 
section : 

(1)  Sub-section  2.1  is  an  introduction  to  the  simulator 
command  language  and  the  running  of  simulated  programs. 

(2)  Sub- section  2.2  covers  the  exceptional  conditions  that 
may  arise  when  using  the  simulator. 

(3)  Sub-section  2.3  covers  the  subroutine  interface  through 
which  a  Fortran  program  may  call  the  Cray-1  simulator.  This  is 
useful  for  simulating  only  a  portion  of  a  program,  while  retaining 
the  rest  of  it  in  Fortran- IV  for  either  cost  or  convenience  reasons. 

(4)  Sub-section  2.4  covers  the  simulator  exit  processing. 
Through  the  Cray-1  Exit  instruction  the  user  may  have  the  simulated 
program  call  a  user  provided  subroutine  to  perform  functions  that 
might  be  provided  by  the  operating  system  or  the  subroutine 
libraries  in  an  actual  Cray-1  environment. 

(5)  Sub-section  2.5  covers  the  report  generation  facilities 
of  the  simulator.  This  reporting  is  controlled  by  the  CPACT,  STAT, 
TACT,  and  TRACE  commands . 

(6)  Sub-section  2.6  covers  inconsistencies  between  the 
simulator  and  the  Cray-1  computer  that  are  presently  known. 
Unimplemented  instructions  are  discussed  here  along  with  other 
minor  inconsistencies  such  as  data  formats,  timing  inaccuracies. 


2 . 1  Command  Language 


The  command  language  provides  the  user  interface  to  the 
Cray-M  simulator.  Through  the  command  language,  the  user  controls 
and  monitors  the  progress  of  the  simulated  program.  The  user  has 
considerable  flexibility  in  controlling  input  to  and  output  from 
the  simulator.  This  section  is  organized  into  the  following  three 
sub- sections : 

(1)  Command  language  input  control 

(2)  Command  language  output  control 

(3)  Running  programs  on  the  Cray-M  simulator 


2.1.1  Command  Language  Input  Control 

Upon  initiation,  the  simulator  will  prompt  for  terminal  input 
by  typing  a  period.  The  user  may  then  enter  a  command  or  redirect 
the  command  input  stream  to  read  from  a  file  via  the  USE  command. 
The  filename  parameter  on  the  USE  command  directs  the  simulator  to 
open  that  file  and  begin  reading  commands.  Upon  an  end-of-file 
condition  the  input  stream  is  switched  back  to  the  terminal. 

More  than  one  USE  command  may  be  issued,  allowing  nested 
command  files  to  be  built  by  the  user.  The  simulator  command 
language  maintains  a  command  stream  input  stack  which  controls 
the  issue  of  nested  USE  commands. 

The  commend  stack  is  also  used  when  the  simulator  is  called 
as  a  subroutine  (see  section  2.3).  For  subroutine  usage,  the 
caller  supplied  command  string  is  split  at  the  command  separator 
character  (a  semi-colon)  and  each  command  is  written  to  a  scratch 
file.  This  scratch  file  is  termed  the  call-file.  The  call  file 
is  terminated  with  a  RETURN  command,  so  that  after  execution  of 
the  caller  commands  automatic  return  is  made  from  the  simulator 
to  the  caller.  After  creating  the  call-file,  the  subroutine 
interface  pushes  the  call-f: le  onto  the  command  stack  causing 
subsequent  commands  cc  .  from  the  call-file. 


Another  use  of  the  command  stack  arises  from  the  use  of  AT 
points  that  may  be  set  by  the  user.  An  AT  point  is  similar  to  a 
break  point,  in  that  each  is  set  at  some  instruction  address  in 
the  user's  program.  Upon  hitting  a  break  point,  program  simulation 
is  halted  and  control  reverts  to  the  terminal  allowing  the  user  to 
monitor  the  program' s  behavior.  An  AT  point  differs,  in  that  when 
it  is  created  the  user  may  also  enter  one  or  more  simulator  commands 
that  will  be  automatically  executed  when  the  AT  point  is  hit.  These 
commands  are  saved  in  a  scratch  file  and  then,  during  simulation 
when  the  AT  point  is  hit,  the  simulator  pushes  the  AT  point's  scratch 
file  onto  the  command  stack  causing  subsequent  commands  to  come  from 
the  AT  file.  A  RUN  command  is  automatically  placed  at  the  end  of  the 
AT  file,  causing  simulation  to  resume  uninterrupted  after  the  AT  com¬ 
mands  have  been  processed.  AT  commands  are  useful  for  automatically 
displaying  register  or  memory  locations  at  selected  points  in  a  pro¬ 
gram.  In  cases  where  the  user  wishes  to  display  various  locations 
and  then  regain  control  for  other  purposes,  entering  the  command 
USE  *MSOURCE*  will  switch  command  input  to  the  terminal  during  AT 
command  processing.  Any  end-of-file  condition 

will  terminate  input  from  the  top  entry  of  the  command  stack, 
causing  the  stack  to  be  popped  and  input  to  continue  from  the  pre¬ 
vious  source.  In  the  case  of  an  AT  file  with  a  'USE  ^MSOURCE* 
command  in  it,  an  end-of-file  condition  from  the  terminal  will  resume 
simulation.  In  fact,  when  a  break  point  is  hit,  the  simulator  auto¬ 
matically  issues  an  implied  'USE  *MSOURCE*  command  which  reverts 
control  to  the  terminal. 

The  command  stack  is  fifteen  levels  deep  with  the  base  entry 
preset  to  *MSOURCE*  which  can  never  be  popped.  Only  one  AT  or  BREAK 
point  can  be  hit  at  any  time,  therefore  a  subsequent  RUN  command  will 
pop  the  command  stack  through  the  last  AT  or  BREAK  entry  on  the  stack. 
Upon  a  RETURN  command  the  command  stack  will  be  popped  through  the 
last  call-file  entry  on  the  stack. 

Occasionally  due  to  an  error  condition  the  message  "Command 
Stack  Reset”  will  be  printed.  This  means  that  the  command  stack  has 
been  cleared  to  the  base  entry  which  is  preset  to  USE  *MSOURCE*.  This 
assures  that  the  error  condition  will  return  input  control  to  the  user. 
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However,  this  means  that  any  commands  not  yet  executed  in  any  out¬ 
standing  call-files,  AT  files  or  USE  files  have  been  lost. 

A  keyboard  attention  interrupt  will  cause  the  command  stack 
to  be  reset.  This  is  useful  to  stop  a  USE  file  or  prevent  sub¬ 
sequent  commands  in  the  call-file  from  being  processed. 


2.1.2  Command  Language  Output  Control 

Normal  output  from  the  simulator  (informational  messages,  DIS¬ 
PLAY  output,  etc.)  can  be  sent  to  another  I/O  unit  by  using  the  SET 
command  to  switch  the  output  device.  For  example,  SET  OUTPUT  =  -FI. 
would  route  the  output  to  file  "-F1" . 


Error  messages  are  output  on  a  different  unit  number  and  always 
go  to  *MSINK*.  If  an  error  situation  arises  causing  the  message 
"Command  Stack  Reset”  to  appear,  the  output  device  will  be  switched 
back  to  *MSINK*,  if  it  was  diverted  elsewhere.  Also,  a  keyboard 
attention  will  switch  the  output  back  to  *MSINK*. 


2.1.3  Running  Programs  on  the  Cray-M  Simulator 

Before  a  program  may  be  run  on  the  simulator,  it  must  first  be 


translated  to  a  format  acceptable  for  loading  into  the  simulator. 

This  translation  is  typically  done  via  a  Cray-M  cross  assembler. 

This  assembler  generates  absolute  or  relocatable  load  modules  that 
can  be  loaded  by  the  simulator  LOAD  command.  The  format  of  the 
load  module  is  described  in  appendix  I. 

When  designing  a  Cray-M  program  to  be  simulated,  consideration 
must  be  given  first  to  the  nature  of  the  algorithm  under  study.  If 
the  program  requires  some  initialization  which  will  not  be  written  in 


Cray-1  assembly  language,  then  perhaps  the  simulator  should  be  called 
as  a  subroutine.  It  is  possible  for  the  calling  program  and  the 
simulator  to  both  share  the  Fortran  common  block  that  is  used  for  the 
simulated  Cray-M  memory.  In  fact,  the  user  may  increase  the  size  of 
the  simulated  Cray-M  memory  beyond  the  4096  IBM  double-words  that 
are  presently  allocated. 
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2.1.4  Simulator  Control 

To  keep  the  simulator  from  running  away  from  the  user  a  keyboard 
attention  interrupt  can  be  signalled  which  has  the  following  effects: 

(1)  Resets  the  command  input  stack  to  read  from  *MSOURCE*  (the 
terminal),  losing  any  outstanding  command  files. 

(2)  Resets  the  output  device  back  to  *MSINK*  (the  terminal) 

(3)  Performs  the  following  command  dependent  actions: 

3.1)  For  a  DISPLAY  command,  an  attention  will  terminate 
the  output.  This  is  useful  if  a  long  display  region  was 
accidently  displayed. 

3.2)  For  a  HELP  or  STAT  command,  an  attention  will  term¬ 
inate  the  output. 

3.3)  For  a  RUN  command,  an  attention  will  stop  the  simula¬ 
tion  and  print  the  parcel  address  of  the  next  instruction 
to  be  executed.  Simulation  may  be  resumed  without  any  loss 
of  timing  information  by  just  entering  a  "RUN"  command.  No 
parcel  address  should  be  supplied  on  the  RUN  command,  as 
this  always  forces  a  buffer  fetch  which  will  make  the  timing 
inaccurate. 

4)  If  for  any  reason  the  simulator  seems  to  be  looping  and 
not  responding  to  attentions,  two  attentions  will  return 
control  to  MTS. 

Attention  trapping  is  only  enabled  while  control  is  inside  the 
simulator  or  the  command  language.  That  is,  if  the  simulator  is 
called  as  a  subroutine,  attention  trapping  is  enabled  only  while  a 
call  to  the  CRAY1  interface  subroutine  is  active. 
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If  the  algorithm  under  study  requires  the  use  of  intrinsic 
functions ,  such  as  SQRT,  SIN,  COS,  etc,  which  would  be  supplied  by 
some  Cray-M  subroutine  library,  the  user  may  provide  these  functions 
through  the  use  of  Cray-M  simulator  EXIT  instruction  dispatcher. 

The  EXIT  instruction  (assembler  mnemonic  EX  exp)  contains  a 
9  bit  expression  field.  If  this  field  is  non-zero  the  simulator 
will  call  a  subroutine  called  CRAYEX ,  passing  the  value  of  the 
expression  field  and  several  register  arguments  to  it.  The  user 
may  write  a  CRAYEX  subroutine  to  process  these  EXIT  codes  and  per¬ 
form  any  function  he  wishes  to  define.  For  example,  an  EXIT  code  of 
one  could  be  defined  to  perform  a  square  root  operation.  This  EXIT 
feature  avoids  the  expense  of  simulating  Cray-M  code  for  such  in¬ 
trinsic  functions  by  allowing  them  to  be  programmed  directly  on  the 
host  machine.  See  section  2.4  for  a  complete  discussion  of  the 
EXIT  dispatcher. 

Several  other  differences  between  a  Cray  computer  and  the 
simulator  arise  due  to  the  nature  of  the  IBM  370  architecture  upon 
which  the  simulator  runs. 

To  speed  the  simulation  of  arithmetic,  all  the  arithmetic  is 
done  using  the  IBM  370  arithmetic  instructions.  The  alternative 
would  be  to  simulate  Cray  arithmetic,  further  raising  the 
simulation  cost.  As  a  consequence  of  using  host  machine  (IBM  370) 
arithmetic,  the  floating  point  data  format  is  different.  On  the 
Cray-1  the  sign  and  exponent  field  is  16  bits  wide  whereas  on  the  IBM 
370  it  is  only  8  bits  wide.  Further,  the  Cray-1  exponent  is  a  base 
2  exponent,  whereas  the  IBM  370  exponent  is  base  16.  Figure  2.1.1 
shows  the  different  formats. 


SIGN  EXPONENT 


COEFFICIENT 


IBM  370 
format 


Long  Floating-Point  Number 


0 

Characteristic 

- — 

14-Digit  Fraction 

0 

1 

8  t!  63 

Figure  2.1.1  -  Cray-1  vs.  IBM  370  Floating  point 
data  formats 

The  simulation  of  the  instruction  computation  is  done  in  its 
entirety  when  the  instruction  issues.  The  pipeline  data  flow  in  the 
Cray-1  is  not  simulated.  This  means  that  upon  hitting  a  BREAK  or 
AT  point,  all  results  of  ptior  instructions  are  available  for  inspection 
or  modification.  The  instruction  where  the  BREAK  or  AT  point  is  set 
has  not  yet  been  executed. 

There  are  three  methods  for  controlling  the  simulation  of  a 
Cray-M  program: 

-Cl )  BREAK  points 

(2)  AT  points 

(3)  An  instruction  issue  limit  parameter. 

BREAK  and  AT  points  may  be  set  at  a  specified  parcel  address 
in  the  simulated  program.  Setting  BREAK  or  AT  points  do  not  change 
the  instruction  at  that  location,  rather,  BREAK  and  AT  points  are 
detected  by  monitoring  the  P  address  register.  This  permits  BREAK 
and  AT  points  to  be  set  before  the  program  is  loaded  or  reloaded. 


When  a  BREAK  point  is  hit,  control  goes  to  the  terminal.  When 
an  AT  point  is  hit,  a  predefined  command  file  is  processed  which  was 
created  when  the  AT  point  was  set.  Control  will  not  go  to  the  ter¬ 
minal  when  an  AT  point  is  hit  if  no  command  causes  this  to  happen. 

An  instruction  issue  limit  may  be  provided  as  an  optional  para¬ 
meter  on  the  simulator  RUN  command.  For  example,  the  following  RUN 
command  would  begin  execution  at  the  current  program  counter  loca¬ 
tions  and  cause  control  to  return  to  the  command  language  after  2500 
Cray-1  instructions  have  been  issued  in  at  least  one  processor  (unless 
an  EXIT  instruction  or  error  condition  occurred) . 

RUN  #2500 

The  issue  limit  parameter  is  a  decimal  number  prefixed  by  a  pound 
sign.  If  no  issue  limit  is  specified  the  remaining  amount  of  a  pre¬ 
vious  limit  is  used  (in  the  case  of  a  BREAK  or  AT  or  attention) .  If 
there  is  no  remaining  amount,  a  default  value  of  1000  is  used.  To 
single  step  through  a  program  use  the  command: 

RUN  #1 

While  in  the  command  language,  the  user  may  display  or  change 
registers  and  memory  locations  by  using  the  DISPLAY  and  CHANGE  commands. 

See  Section  4  for  command  descriptions  of  all  simulator  commands. 

The  cost  of  simulating  Cray-M  programs  is  an  important  factor. 

The  simulator  provides  three  levels  of  cost  control: 

Level  1  -  Result  computation  only,  which  allows  debugging  but 

eliminates  the  cost  associated  with  timing  the  Cray-1 
instructions. 

Level  2  -  Timing  enabled,  allowing  the  timing  of  the  simulated 
program  at  a  cost  of  about  5  times  the  level  one 
cost,  per  processor. 

Level  3  -  CPACT  (clock  period  activity  report)  enabled,  increases 
the  cost  to  about  20  times  the  level  one  cost,  per 
processor. 

Section  five  treats  the  cost  issue  further. 


2.2  Exceptional  Conditions 

While  executing  a  Cray-M  program,  the  simulator  may  en¬ 
counter  any  of  several  exceptional  conditions  which  will  halt  the 
simulation.  The  four  possible  exceptional  conditions  are  listed 
below  followed  by  a  discussion  of  each  one: 

1)  Error  exit 

2)  Program  range  error 

3)  Operand  range  error 

4)  Invalid  instruction  executed 


An  occurrence  of  any  exceptional  condition  will  reset  the  com¬ 
mand  stack  and  switch  OUTPUT  back  to  the  terminal  if  it  was  diverted 
elsewhere.  The  name  of  the  routine  being  executed  will  be  displayed, 
if  possible. 

2.2.1  Error  Exit 

An  error  exit  is  caused  when  the  Cray-1  executes  a  zero  op-code. 
The  simulator  signals  this  condition  by  printing  the  message: 

ERROR  EXIT  AT  -  p-addr 

where  p-addr  is  the  parcel  address  of  the  error  exit  instruction. 

Since  memory  is  initialized  with  zeros  when  the  simulator  is  started 
up,  a  bad  or  missing  branch  could  cause  an  error  exit. 

2.2.2  Program  range  error 

A  program  range  error  is  caused  by  a  branch  instruction  which 
attempts  to  jump  outside  the  limits  of  the  currently  defined  simulator 
memory.  If  used  stand-alone,  4096  words  of  simulator  memory  are 
available.  A  program  range  error  is  signalled  by  the  message. 

PROGRAM  RANGE  ERROR. 

BRANCH  AT  bch-p-addr 
TARGET  ADDRESS  WAS  tar-p-addr 
MEMORY  SIZE  IS  msize 
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The  parcel  address  of  the  offending  branch  instruction  is 
given  by  the  bch-p-addr  field.  The  invalid  target  address  of  the 
offending  branch  instruction  is  given  by  the  tar-p-addr  field. 

The  memory  size  (msize)  is  printed  in  octal  for  comparison  with 
the  invalid  target  address  and  to  inform  the  user  of  the  current 
memory  size. 

If  the  user  has  tried  to  extend  the  size  of  the  simulated 
Cray-M  memory  by  loading  a  longer  common  block,  he  must  inform 
the  simulator  of  this  by  setting  the  MEMStZ  word  in  the  MSIZE  common 
block  to  the  correct  size  of  the  Cray-M  memory  (see  section  2.3).  If 
the  user  forgets  to  do  this  a  size  of  4096  is  assumed  which  may  cause 
the  program  range  error. 


2.2.3 


rerand  range  error 


An  operand  range  error  is  caused  by  an  operand  load  or  store 
that  exceeds  the  limits  of  the  currently  defined  simulator  memory. 

If  used  stand-alone,  4096  words  of  simulator  memory  are  available. 

An  operand  range  error  is  signalled  by  the  message, 

OPERAND  RANGE  ERROR  AT  P  *  p-addr 
MEMORY  SIZE  IS  msize 

The  parcel  address  of  the  offending  memory  reference  instruction 
is  given  by  the  p-addr  field.  The  memory  size  (msize)  is  printed 
in  octal  to  inform  the  user  of  the  current  memory  size.  The  comments 
above  (under  program  range  error) ,  about  user  extension  of  Cray-M 
memory,  apply  here  as  well. 

A  vector  load  or  store  to  memory  can  cause  an  operand  range 
error  in  several  ways: 

1)  The  base  address  may  be  out  of  range 

2)  The  operand  increment  may  be  too  large 

3)  The  vector  length  may  be  too  large. 


2-2.4  Invalid  instruction  executed 

The  monitor  mode  Cray-1  instructions  are  not  implemented  on 
the  simulator.  When  one  of  these  is  executed,  the  simulator  will 
print  the  message, 

**  ATTEMPT  TO  EXECUTE  INVALID  INSTRUCTION  AT  :  p-addr 

will  be  printed  and  the  simulator  will  return  to  the  command  lang¬ 
uage.  The  offending  instruction's  parcel  address  (p-addr)  is  printed 
to  aid  in  finding  the  instruction. 


2.2.5  Floating  point  interrupt 

The  floating  point  interrupt  exception  is  handled  differently 

by  the  simulator  than  it  is  on  the  Cray-1.  This  discussion  will  deal 

with  the  simulator  response  to  a  floating  point  interrupt.  See  the 
Cray-1  Reference  Manual  for  the  Cray-1  response. 

The  simulator  response  to  a  floating  point  interrupt  is  a 
consequence  of  the  behavior  of  the  IBM  370  architecture.  Three 
types  of  floating  point  interrupts  may  occur: 

1)  Exponent  overflow 

2)  Exponent  underflow 

3)  Division  by  zero.  « 

All  three  types  of  floating  point  interrupt  may  be  suppress' 
if  the  floating  point  interrupt  bit  in  the  Cray-1  mode  register 
clear.  When  the  simulator  starts  up,  this  mode  register  bit  is 
thereby  enabling  all  three  types  of  floating  point  interrupts, 
setting  of  this  mode  register  bit  may  be  controlled  by  the  user 
two  ways: 

1)  Through  the  SET  EFI  “  [ON/OFF]  command,  the  user  may 
enable  or  disable  floating  point  interrupts. 

2)  Through  the  Cray-1  instructions  EFI  and  DFI,  the  program 
may  enable  or  disable  floating  point  interrupts. 

Only  one  floating  point  interrupt  is  detected  for  each  inst 
simulated.  This  means  that  if  a  vector  instruction  causes  20  exp 
overflows,  only  one  will  be  detected.  After  the  instruction  has 
finished  executing  the  simulator  will  announce  the  floating  poir. 
exception  (if  the  EFI  mode  bit  is  set)  and  return  to  the  command 
language . 


v. 
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When  an  exponent  overflow  occurs,  the  following  message  is 
printed : 

**  EXPONENT  OVERFLOW  ** 

FLOATING  POINT  ERROR  AT  P  ■  p-addr 

When  an  exponent  underflow  occurs,  the  following  message  is 
printed : 

**  EXPONENT  UNDERFLOW  ** 

FLOATING  POINT  ERROR  AT  P  *  p-addr 

When  a  division  by  zero  occurs,  the  following  message  is 
printed : 

**  FLOATING  POINT  DIVIDE  CHECK  ** 

FLOATING  POINT  ERROR  AT  P  -  p-addr 

For  each  of  the  three  messages  the  parcel  address  (p-addr) 
of  the  instruction  causing  the  interrupt  is  printed. 

2.2.6  Attention  interrupt 

To  stop  the  simulation  or  regain  control  during  command  file 
processing,  the  MTS  terminal  user  may  issue  a  keyboard  attention 
bv  hitting  the  break  key  or  a  control-E.  This  attention  interrupt 
will  reset  the  command  stack  and  halt  simulation  if  in  progress.  If 
simulation  was  in  progress  the  message, 

**  SIMULATOR  ATTN  AT  P  »  p-addr  ** 

will  be  printed,  where  p-addr  is  the  instruction  to  execute  next 
if  simulation  is  continued.  An  attention  will  cause  no  information 
to  be  lost  and  simulation  may  be  resumed,  as  if  never  interrupted, 
bv  entering  a  RUN  command  without  a  p-addr  parameter. 


aaaaaaaiaiii  ■ 
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2.3  Subroutine  Interface 

The  Cray-M  simulator  may  be  called  as  a  subroutine  from  a 
user  Fortran-iv  program.  Three  benefits  provided  by  this  inter¬ 
face  are: 

1)  Being  able  to  convert  only  a  portion  of  a  Fortran  pro¬ 
gram  to  Cray- 1  assembly  language  allows  you  to  simulate 
the  converted  portion  while  leaving  the  remaining  in  Fortran 
to  run  more  efficiently  on  the  host  machine. 

2)  Being  able  to  enlarge  the  amount  of  simulated  Cray-1 
memory  by  extending  the  memory  common  block  in  the 
user's  calling  program  and  loading  this  program  first. 

This  avoids  the  need  for  recompiling  the  simulator. 

3)  When  studying  a  given  algorithm  for  application  to  the 
Cray-M,  it  is  convenient  to  perform  any  housekeeping  and 
initialization  functions  in  the  user's  Fortran  program* 
Therefore,  only  the  algorithm  need  be  coded  in  Cray-1 
assembly  language. 

This  section  will  discuss  the  protocol  used  to  communicate  with 
the  simulator  from  a  calling  program.  This  communication  has  two 
aspects  to  it:  (1)  the  subroutine  interface  used  to  pass  commands 
and  control  to  the  simulator  and  (2)  the  shared  Cray-M  memory  inter¬ 
face  used  to  pass  data  to  and  from  the  simulator. 

2.3.1  Simulator  subroutine  call 

To  access  the  simulator  as  a  subroutine  the  following  Fortran 
subroutine  call  is  used: 

CALL  CRAY1 ( ' cmd [  jcmd  ]  ...  i ' ,echosw) 

The  first  argument  is  a  literal  string  enclosed  by  apostrophes 
which  may  be  composed  of  one  or  more  simulator  commands.  Each  com¬ 
mand  follows  the  same  syntax  as  the  commands  described  in  section  4. 
To  specify  multiple  commands  with  a  single  call  to 
the  simulator,  separate  the  commands  with  a  semicolon.  The  entire 
command  string  must  be  terminated  with  an  exclamation  point  and  may 
not  exceed  200  characters. 


The  second  parameter  (echosw)  is  a  logical  constant  or  variable. 
This  parameter  controls  the  echoing  of  the  commands  passed  in  the 
first  argument.  If  echosw  is  .TRUE.,  the  commands  will  be  echoed 
to  the  current  simulator  output  device  as  they  are  processed.  If 
echosw  is  .FALSE.,  command  echoing  is  suppressed. 

If  the  user  wants  to  give  control  to  the  terminal  at  some  point 
in  the  command  string  sequence,  the  command  USE  *MSOURCE*  will  allow 
additional  commands  to  be  read  from  the  terminal.  For  example,  the 
call 

CALL  CRAYl ( 'LOAD  TRIDECjUSE  *MSOURCE* ? RUN  #2000 !',. TRUE . ) 

will  cause  the  file  TRIDEC  to  be  loaded  into  the  simulator  memory 
after  which, the  USE  command  will  cause  control  to  go  to  the  user's 
terminal,  allowing  breakpoints  to  be  set,  etc.  An  end-of-file  con¬ 
dition  at  the  user's  terminal  (via  ENDFILE,  control-c,  etc.)  will 
terminate  the  USE  command  permitting  the  "RUN  #2000"  command  to 
be  executed.  When  the  last  command  in  the  command  string  is  executed 
an  automatic  return  is  made  to  the  caller  of  the  simulator.  By  set¬ 
ting  the  echosw  parameter  to  .TRUE.,  the  three  passed  commands  will 
be  echoed  to  the  simulator  output  device  as  they  are  processed. 

In  order  to  call  the  simulator  as  a  subroutine,  the  user's  pro¬ 
gram  must  first  get  control.  To  accomplish  this,  two  things  must  be 
done: 

1)  The  user  program  must  be  set  up  as  a  main  program. 

2)  The  user  program  must  be  loaded  before  the  simulator  is 
loaded. 

This  is  a  consequence  of  the  following  two  facts: 

1)  When  MTS  starts  up  a  Fortran  program  (via  the  MTS  $RUN 
command) ,  control  is  given  to  the  main  program. 

2)  When  the  MTS  loader  encounters  more  than  one  main  program 
it  ignores  all  but  the  first  one. 

The  simulator  has  a  small  internal  main  program  which  gets  control  if 
the  simulator  is  run  stand-alone.  But,  if  the  user  writes  a  main 
program  and  loads  it  before  the  simulator  is  loaded,  the  simulator's 
main  program  is  ignored  by  the  loader.  Therefore,  when  loading  is 
finished  MTS  will  give  control  to  the  user's  main  program. 
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As  an  example,  suppose  the  user  wrote  the  following  Fortran 
main  program  and  compiled  it  into  the  MTS  file  MAIN.O. 

CALL  CRAY1 ( ' USE  *SOURCE* i  '  ,  . FALSE . ) 

STOP 
END 

To  use  this  main  program  and  have  it  get  control  first,  use  the 
following  MTS  run  command: 

$RUN  MAIN . 0+CRAY1 

Although  most  user  main  programs  would  be  more  complicated  than 
this  one,  this  main  program  is  in  fact  the  small  internal  main 
program  used  by  the  simulator. 

2.3.2  Simulator  memory  sharing 

The  simulated  Cray-M  memory  can  be  shared  both  by  the  simulator 
and  the  user's  calling  program.  This  is  accomplished  by  having  the 
user  include  in  his  program  the  Fortran  common  block  declaration  used 
by  the  simulator  to  allocate  the  Cray-M  memory  space.  This  commom 
block  declaration  appears  in  the  simulator  as  follows: 

DOUBLE  PRECISION  MEM 
COMMON  /MEMORY/  MEM  (4096) 

COMMON  /MSIZE/  MEMSIZ 
INTEGER  IMEM (2,1) 

EQUIVALENCE  (MEM(l) ,  IMEM(1,1)) 

The  MSIZE  common  block  contains  the  single  word  MEMSIZ  whose 
value  is  the  current  size  of  Cray-M  memory.  MEMSIZ  is  used  to  per¬ 
form  bounds  checking  on  branches  and  memory  references  made  by  the 
simulated  Cray-M  instructions.  When  the  simulator  is  called  for 
the  first  time  some  once-only  initialization  is  done  which  includes 
zeroing  all  of  Cray-M  memory  (MEM).  Therefore,  MEMSIZ  must  be  init¬ 
ialized  properly  before  the  first  call  to  CRAYl.  Further  since  the 
once-only  initialization  will  zero  Cray-M  memory,  the  very  first 
call  to  CRAYl  must  be  made  before  the  user's  calling  program  initialize 
any  of  MEM.  It  is  suggested  that  this  first  initialization  call  be 
made  as  follows: 


CALL  CRAY1 ( ' INIT ! ' , . FALSE . ) 


The  MEMORY  common  block  contains  the  array  MEM,  which  is  used 
as  the  Cray-M  memory  by  the  simulator.  This  is  declared  in  the 
simulator  to  be  4096  double  words  long.  The  user  may  extend  this 
common  block  to  enlarge  the  Cray-M  memory.  This  is  done  by  writing 
a  Fortran  main  program  which  includes  the  common  declaration  state¬ 
ments  shown  above,  but  with  the  4096  constant  replaced  with  a  larger 
value  as  needed.  Then  by  loading  the  user  main  program  first  (see 
section  2.3.1),  the  user's  main  program  not  only  replaces  the  simu¬ 
lator's  main  program,  but  the  user's  enlarged  version  of  the  MEMORY 
common  block  replaces  the  simulator's  version. 

To  pass  data  to  and  from  the  simulator  Cray-M  memory,  the  user 
need  only  read  and  write  data  to  the  MEM  array.  However,  because  the 
Cray-M  memory  address  starts  at  location  zero  and  Fortran  arrays  are 
indexed  beginning  at  one,  the  user  must  formulate  the  index  into  MEM 
by  using  the  Cray-M  memory  address  and  adding  one  to  it.  For  example, 
Cray-M  memory  location  3  is  MEM ( 4 ) . 

The  following  example  is  the  skeletal  structure  of  a  user  main 
program  which  extends  Cray-1  memory  to  8192  words. 


DOUBLE  PRECISION  MEM 


COMMON  /MEMORY/  MEM (8192) 
COMMON  /fcSIZE/  MEMSIZ 
INTEGER  IMEM (2,1) 

EQUIVALENCE (MEM  a ) , IMEM  (1,1)) 


SET  UP  MEMSIZ  WITH  THE  NEW  MEMORY  SIZE. 
MEMSIZ  -  8192 


DO  SIMULATOR  ONCE-ONLY  INITIALIZATION 


CALL  CRAY1( 'INITl ', .FALSE. ) 


I  User  initialization  of  Cray-1  memory 


CALL  CRAY1 ( ' LOAD  UPROG?USE  *MSOBRCB* ! ’ , . TRUE. ) 


I  User  prints  out  results  of  simulated  computation 


STOP 


See  section  3.2  for  a  more  complete  example  of  accessing  the  simulator  ^ 
as  a  subroutine. 
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2.4  CRAYEX  Exit  Dispatcher 


As  discussed  in  section  2.3,  it  is  often  useful  to  allow  the 


Cray-M  simulation  to  be  embedded  as  a  portion  of  a  larger  Fortran 
program.  Conversely,  it  is  also  useful  to  be  able  to  call  a  Fortran 
program  from  within  the  simulated  Cray-M  program.  This  transfer 
of  control  from  the  Cray-M  program  to  a  Fortran  program  is  accomplished 
through  the  use  of  the  Cray-1  exit  instruction. 

The  Cray-1  assembly  language  mnemonic  for  the  exit  instruc¬ 
tion  is  shown  below: 


EX  ijk 

The  exit  code  field  (ijk)  is  a  nine  bit  field  within  the  exit  in¬ 
struction.  Exit  codes  may  range  from  zero  to  511  decimal.  When 
the  simulator  encounters  an  exit  instruction,  it  checks  the  exit 
code  field  (ijk)  for  a  non-zero  value.  If  ijk  is  zero,  a  normal 
Cray-1  program  exit  is  performed.  If  ijk  is  non-zero,  the  simula¬ 
tor  will  call  the  subroutine  CRAYEX.  If  the  user  supplies  a  CRAYEX 
subroutine  and  loads  it  first  (see  section  2.3.1),  the  user's 
CRAYEX  routine  will  get  control.  If  no  user  CRAYEX  routine  is 
provided,  the  simulator  will  perform  a  normal  Cray-1  program  exit. 

If  the  user  provides  a  CRAYEX  routine  the  simulator  will  call  it  with 
the  following  Fortran  subroutine  call  statement: 

CALL  CRAYEX (IJK,  AREG,  SREG,  VREG,  VL,  EXSW) 

The  arguments  passed  by  the  subroutine  call  are  discussed  below: 

IJK  -  This  input  parameter  is  an  integer  which  contains  the  value 
of  the  ijk  field  in  the  exit  instruction.  It  may  be  used 
as  a  dispatch  parameter,  allowing  different  exit  codes  to  per¬ 
form  different  functions. 


AREG  -  This  parameter  is  an  eight  element  integer  array  used  to 

pass  the  Cray-1  A-register  contents  of  the  CPU  that  executed 
the  EX  instruction  to  CRAYEX.  This  allows  arguments  to  be 
provided  and  results  returned  through  the  A-registers.  Cray 
1  register  AO  corresponds  to  AREG ( 1 ) . 


This  parameter  is  an  eight  element  double  precision  array 
used  to  pass  the  S-register  contents  of  the  CPU  that  execut¬ 
ed  the  EX  instruction  to  CRAYEX.  This  allows  arguments  to  be 
provided  and  results  returned  through  the  S-registers.  Cray- 
1  register  SO  corresponds  to  SREG(l). 

This  parameter  is  a  double  precision  array,  dimensioned 
as  (64,8),  used  to  pass  the  vector  register  contents  of  the 
CPU  that  executed  the  EX  instruction  to  CRAYEX.  Arguments 
may  be  provided  and  results  returned  through  the  vector 
registers.  Cray-1  vector  register  VO  corresponds  to  VREG 
(-,1). 

This  parameter  is  an  integer  which  contains  the  value  of 
the  vector  length  register  of  the  CPU  that  executed  the  EX 
instruction.  On  entry  to  CRAYEX,  VL  will  always  be  between 
1  and  64.  VL  may  be  changed  by  CRAYEX  and  this  change  will 
be  reflected  in  the  Cray-1  vector  length  register.  On 
return  from  CRAYEX  to  the  simulator  VL  must  be  in  the  range 
of  1  to  64. 

In  addition  to  the  CRAYEX  calling  parameters,  the  CRAYEX  pro 
gram  may  access  Cray-fl  memory  by  sharing  the  memory  common  block 
as  described  in  section  2.3.2.  This  permits  the  CRAYEX  routine 
to  perform  major  computation,  I/O  etc.,  directly  to  the  Cray-M 
memory. 


The  following  example  is  a  skeleton  CRAYEX  dispatcher. 


C 

c  .  .  . 

c 

c  .  .  . 


c 

c  .  .  . 
100 


c 

C  •  •  • 
200 


SUBROUTINE  CRAYEX ( I JK,  AREG,  SREG,  VREG,  VL,  EXSW) 
LOGICAL  EXSW 
INTEGER  AREG (8) ,  VL 

DOUBLE  PRECISION  SREG (8),  VREG (64, 8) 

DISPATCH  ON  THE  EXIT  CODE. 

GO  TO  (100,  200,  300,  ...),  IJK 

EXIT  CODE  UNDEFINED  -  TREAT  AS  NORMAL  EXIT 
EXSW  =  .TRUE. 

RETURN 

EXIT  CODE  *  1. 


I  do  exit  code  1  processing. 
RETURN 

EXIT  CODE  =  2 

l  do  exit  code  2  processing. 


RETURN 


END 


The  user  may  define  exit  code  one  to  be  a  SQRT  function,  exit 
code  2  to  be  COS  function,  etc.  Arguments  and  results  may  be  passed 
through  the  registers  or  memory  providing  considerable  flexibility 
in  the  algorithm  design  and  implementation. 

2.5  Report  Generation 

The  Cray-M  simulator  produces  five  kinds  of  report  outputs. 

STAT.  CP ACT ,  TRACE,  TACT  and  TACT  STAT.  The  STAT  report  is  a  sum¬ 
mary  report  of  the  program’s  use  of  Cray-M  resources,  i.e.,  across 
all  Cray-1  processors.  The  CPACT  report  is  a  detailed  report  of 
Individual  Cray-1  resource  usage  at  each  clock  period  of  the  program' 


execution.  The  TRACE  reoort  is  a  flowtrace  which,  for  each  exe¬ 
cuted  instruction,  displays  for  each  CRAY-1  the  instruction  mnem¬ 
onic,  the  instruction  address,  and  the  contents  of  the  storaqe 
locations  that  the  instruction  affects.  The  TACT  report  shows 
which  task  each  CPU  is  executing  at  constant  clock  period  inter¬ 
vals,  analogous  to  the  CPACT  clock-level  report  for  each  processor. 
The  TACT  STAT  report  is  a  summary  report  of  all  task  activity, 
analogous  to  the  STAT  clock-level  summary.  For  p  processors,  a 
total  of  2p  +  3  reports  can  be  generated  for  each  run. 

2.5.1  STAT  report 

The  STAT  report  summarizes  clock-level  activity  across  all 
processors  and  consists  of  three  sections: 

1)  Vector  Usage  Counts 

2)  Floating  Point  Result  Counts 

3)  Data  Traffic  Counts 

Timing  must  be  enabled  only  for  the  Vector  Usage  Counts  section. 

The  Vector  Usage  Counts  section  reports  the  program's  use  of 
a  Cray  vector  unit  resources.  Figure  2.5.1,  below,  shows  the 
Vector  Usage  Counts  table. 


0  OP  H  CHAT-  M  SI SOLA TOR 
VECTOR  OSAGE  COURTS 


(01138) 


PP  ADD 

PP  HUL 

PP  DIV 

LOG. 
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I.  ADD 
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67 
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0 

10322 
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36.20* 
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0.41* 
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63 

63 

1201 

0 
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88 
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1 

19 

0 
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37 
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Vector 

Usage 

Counts 

Table 

COH.  TIHZRG 

TINE  BOST  (CP) 
*  TIDE  BOST 
HO.  RESULTS 
HO.  VECTORS 
AVERAGE  VL 

HUH  TIES  (CP) 
BPLOPS 

COH POSITS  A  VL 

COHCOBBSHCT 

HIPS 


Each  column  of  the  Table  represents  a  different  vector  functional 
unit.  Left  to  right  the  units  are:  floating  point  add,  floating 
point  multiply,  floating  point  reciprocal  approximation,  logical, 
shift,  integer  add  and  memory,  split  between  vector  loads  and  vector 
stores.  The  rows  of  the  table  represent:  unit  busy  time,  percent 
unit  busy  of  total  run  time,  the  number  of  results  produced  by  the 


unit,  the  number  of  vector  instructions  issued  to  the  unit  and 
the  average  vector  length  processed  by  the  unit. 

Five  other  statistics  are  printed  beneath  the  table:  the 
run  time  since  the  last  INIT  command  or  simulator  start  up,  the 
MFLOPS  (million  floating  point  operations  per  second)  for 
the  program,  the  composite  average  vector  length  over  all  vector 
units,  the  vector  unit  concurrency,  and  the  MIPS  rate. 

MFLOPS  is  calculated  over  all  floating  point  operations, 
both  vector  and  scalar.  It  is  computed  as  the  number  of  floating 
point  operations  divided  by  the  program  run  time  in  seconds. 

Concurrency  is  calculated  as  the  sum  of  all  vector  unit  busy 
times  divided  by  the  program  run  time.  It  is  a  global  measure  of 
the  concurrent  use  of  the  Cray-M  vector  units. 

MIPS,  millions  of  instructions  per  second,  is  calculated  a3, 
the  number  of  instructions  issued  divided  by  the  program  run  time 
in  seconds. 

The  Floating  Point  Result  Counts  section  reports  the  program's 
use  of  both  vector  and  scalar  floating  point  operations.  For  each 
entry  in  the  table  (Figure  2.5.2)  both  the  number  of  results  and 
its  percentage  are  printed. 


FLOATING  POINT  RESULT  COUNTS 


VECTOR  <«) 
SCALAR  (V) 


ADDITION 

5530  (  43.*) 
5  (  0.0) 


HULTI PLICATION 

7119  (  55.9) 
5  (  0.0) 


RECIPROCAL 

63  (  0.5) 

8  (  0.1) 


TOTAL 

12712  (  99.9) 
18  (  0.1) 


TOTAL  (*) 


5535  (  43.5) 


7124  (  56.0) 


71  (  0.6)  12730  (100.0) 


Figure  2.5.2  -  Floating  Point  Result  Counts  Table 

Floating  point  additions  (and  subtractions)  and  reciprocals  are 
counted  directly  from  the  instructions  that  perform  them,  but  the 
multiplication  count  requires  some  adjustment  due  to  the  reciprocal 
approximation. 

Because  a  reciprocation  on  the  Cray-1  is  an  approximation, 
two  additional  multiplications  must  be  done  to  get  a  full  precision 
result.  One  of  these  multiplications  is  a  reciprocal  iteration  and 
the  other  is  a  standard  multiplication.  The  Cray-1  instruction  se¬ 
quence  below  illustrates  the  scalar  instructions  used  to  obtain  a 
full  precision  scalar  reciprocal  (Si  ■  1/S2) . 
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si 

/HS2 

reciprocal 

approximation 

S2 

S2*IS1 

reciprocal 

iteration 

SI 

S2*S1 

extend  precision 

To  count  these  additional  multiplies  as  part  of  the  floating  point 
operation  count  would  overstate  this  count,  since  they  really  are 
part  of  a  single  reciprocal  operation.  Consequently  these  two 
multiplies  have  been  deducted  from  the  multiply  count  in  the  table. 
This  adjustment  is  made  by  subtracting  the  number  of  detected 
reciprocal  iterations  from  the  number  of  standard  multiplications. 

A  reciprocal  iteration  is  detected  through  the  issue  of  a  067,  166 
or  167  instruction.  The  sum  of  all  vector  and  scalar  floating 
point  operations,  shown  in  the  lower  right  corner  of  the  Figure 
2.5.2,  is  used  as  the  numerator  of  the  MFLOPS  calculation  discussed 
above.  To  receive  a  floating  point  result  count  report,  the  FULL 
option  must  be  specified  on  the  STAT  command. 

The  Data  Traffic  Counts  section  is  the  last  section  of  the 
STAT  report.  It  is  only  printed  if  the  FULL  option  is  specified  on 
the  STAT  command.  Figure  2.5.3,  on  the  following  page,  is  an  example 
of  the  Data  Traffic  Counts  section. 

This  section  reports  the  amount  of  data  traffic  on  the  major 
data  paths  of  the  Cray-M.  To  aid  in  identifying  the  various  data 
paths  for  which  traffic  information  is  provided,  the  simulator  prints 
a  block  diagram  of  a  Cray-1  uni-processor  and  attaches  path 
labels  to  each  of  the  data  paths.  These  path  labels  are  referenced 
on  the  left  hand  side  of  the  report  preceeding  a  number,  representing 
the  number  of  operands  shipped  over  that  path.  The  data  paths  with 
arrows  are  uni-directional  whereas  the  paths  shown  dotted  are  bi¬ 
directional. 
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The  left  most  column  of  the  figure  represents  the  Cray-**  computa- 
tional  units  divided  into  three  groups;  vector,  scalar  and  address. 

The  floating  point  functional  units  are  assumed  to  be  shared  between 
the  vector  and  scalar  groups. 

The  center  column  of  the  figure  represents  the  Cray-M  register 

•  4 

storage.  Top  to  bottom  these  four  register  groups  are  the  vector  ■ 

registers,  the  scalar  registers,  the  T  and  B  registers  and  the  address 
registers.  The  vertical  bi-directional  communication  paths  (shown  dotte'! 


I 
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between  the  four  register  groups  are  used  for  inter-group  data 
transfers. 

The  right  hand  column  of  the  figure  represents  Cray-M  main 
memory.  MEMORY  is  shown  in  four  sections  only  for  the  purpose  of 
the  figure.  Any  register  group  may  reference  any  location  in  Cray-1 
main  memory. 

The  labeling  scheme  is  defined  as  follows: 

1)  "A"  means  address,  "S"  means  scalar  and  "V"  means 
vector. 

2)  "0"  means  operands  and  "R"  means  result. 

3)  "X"  means  a  bi-directional  data  path 

4)  "M"  means  the  path  is  a  memory  path  used  by  the  three 
register  groups  tied  both  to  memory  and  a  computational 
unit.  The  T  and  B  registers  communicate  only  with 
memory  and  other  register  groups. 

For  example,  "SMO"  is  the  operand  data  path  to  the  scalar  registers 
from  memory,  where  "SO"  is  the  operand  data  path  to  the  scalar  com¬ 
putation  units  from  the  scalar  registers. 

Below  the  data  path  portion  of  the  report,  four  other  statistics 
are  printed: 

1)  MISC.  represents  the  number  of  miscellaneous  instructions 
executed  by  the  program  that  do  not  move  data  across  any 
of  the  paths  shown  in  the  figure  and  are  not  branch 
instructions.  The  instructions  counted  include:  op-codes 
2-4,  20-22,  40-43,  72-73. 

2)  BRANCHES  represent  the  number  of  branch  instructions 
executed  by  the  program  whether  the  branch  is  taken  or  not 

3)  FETCHES  represent  the  number  of  parcel  buffer  fetches 
incurred  by  the  running  program. 

4)  ISSUES  represent  the  number  of  instructions  issued  by 
the  running  program. 

The  last  part  of  the  Data  Traffic  Counts  section  shows  nine 
percentage' and  ratio  calculations.  Each  of  these  are  discussed  below 
with  their  derivation. 
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Percent  of  vector  operands  supplied  by  cache. 

The  term  cache  refers  to  the  eight  Cray-1  vector  reg¬ 
isters.  This  percentage  reflects  the  dominance  of  the 
cache  over  memory  in  supplying  vector  operands  to  the 
vector  units.  It  is  defined  as, 

VO-VMO  „ 


Percent  of  total  vector  traffic  supplied  by  cache. 

This  percentage  is  similar  to  (1)  above,  but  also  includes 
the  effect  of  the  vector  results  data  traffic.  It  is 
defined  as. 


$1 


VO-VMO  +  VR-VMR 
VO  +  VR 


*  100 


Percent  vector  results  of  total  results. 

This  percentage  is  a  measure  of  the  vector-scalar  composi¬ 
tion  of  the  program’s  computation.  This  figure  reflects 
the  percentage  of  all  results  computed  in  vector  mode. 
Because  scalar  and  vector  instructions  can  execute  con¬ 
currently,  this  figure  is  not  the  percentage  of  time 
spent  in  vector  mode.  This  figure  is  defined  as. 


VR  +  SR  +  AR 

4.  Percent  vector  memory  traffic  of  total  memory  traffic. 

This  percentage  is  a  measure  of  the  vector-scalar  composi¬ 
tion  of  the  program's  memory  usage.  This  figure  reflects 
the  percentage  of  vector  traffic  to  and  from  the  main 
memory.  It  is  defined  as, 


VMO  +  VMR 
TMDT  +  FETCH 


*  100 
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FETCH  is  the  number  of  memory  words  read  into  the  instruc¬ 
tion  parcel  buffers.  TMDT  is  the  total  memory  data  traffic 
and  is  defined  as, 

TMDT  =  VMO+VMR  +  SMO+SMR  +  AMO+AMR  BTO+BTR 

5.  Ratio  of  computation  traffic  to  memory  traffic. 

This  ratio  is  a  measure  of  the  benefit  provided  by  the 
register  portion  of  the  Cray-1  memory  hierarchy  in  re¬ 
ducing  the  main  memory  data  traffic.  If  this  ratio  was 
one  there  would  be  no  benefit  in  having  the  registers, 
since  register  traffic  equals  memory  traffic.  Typically 
this  ratio  is  in  the  range  of  two  to  five  indicating 
that  the  registers  provide  a  substantial  reduction  in 
main  memory  data  traffic.  This  ratio  is  defined  as, 

VO+VR  +  SO+SR  +  AO+AR 
TMDT 

6.  Ratio  of  vector  memory  operands  to  vector  memory  results. 

This  ratio  is  a  measure  of  the  average  vector  operand  re¬ 
quirements  of  the  program.  This  ratio  combined  with  the 
vector  memory  result  rate  (see  10  below)  and  the  algorithmic 
complexity  of  main  memory  usage  (the  computational  lifetime  of 
data  in  main  memory)  will  allow  the  algorithm  designer 
to  determine  the  mass  storage  I/O  data  rates  necessary 
to  keep  the  vector  arithmetic  units  constantly  busy.  This 
ratio  is  defined  as, 

VMO 

VMR 

7.  Ratio  of  vector  unit  results  to  vector  memory  operands. 

This  ratio  is  a  figure  of  merit  of  the  average  value  of 
main  memory  operands  in  the  computation.  A  value  of  two 
would  imply  that  each  main  memory  operand  precipitates 
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2. 5. 1.1  STAT  Example 

To  illustrate  the  information  provided  by  the  STAT  command, 
one  example  is  presented.  The  code  in  this  example  is  one  which 


multiplies  four  pairs  of  matrices  together.  After  running  the 
program  with  timing  turned  on  (SET  TIM=ON) ,  the  STAT  FULL  command 
is  given,  producing  a  STAT  Report  containing  all  three  sections 
(Vector  Usage,  Floating-point  Result  and  Data  Traffic). 


U  OF  M  CRAY-M  SIMULATOR  (EPOU) 


CONCURRENCY 

MIPS 


VECTOR  USAGE  COUNTS 


CUM.  TIMING 
TIME  BUSY  <CP> 

X  time  busy 
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NO.  VECTORS 
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RUN  TIME  (CP)  i 
MFLOPS  ! 
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1.3? 

16.16 


SHIFT  I.  ADO  V-LOAD  V-STOR 


-  w  4^0  v  V  106  6V 
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128 
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128  64 

2  1 
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VECTOR  (X) 
SCALAR  <X> 

TOTAL  (X) 


FLOATING  POINT  RESULT  COUNTS 


ADDITION 


MULTIPLICATION  RECIPROCAL 


128  <  66.7) 

0  <  0.0) 

128  <  66.7) 


64  (  33.3) 

0  <  0.0) 

64  (  33.3) 


0  <  0.0) 

0  (  0.0) 

0  (  0.0) 


TOTAL 

192  (100.0) 

0  (  0.0) 

192  (100.01 


Figure  2,5.4,  STAT  Example 
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MISC.  :  13 

BRANCHES  :  4 

FETCHES  :  3 

ISSUES  5  79 

PERCENT  OF  VECTOR  OPERANDS  SUPPLIED  BY  CACHE  -  75.00X 

PERCENT  OF  TOTAL  VECTOR  TRAFFIC  SUPPLIED  BY  CACHE  -  76.92X 

PERCENT  VECTOR  RESULTS  OF  TOTAL  RESULTS  -  92.49X 

PERCENT  VECTOR  MEMORY  TRAFFIC  OF  TOTAL  MEMORY  TRAFFIC  -  79.34X 

RATIO  OF  COMPUTATION  TRAFFIC  TO  MEMORY  TRAFFIC  -  4.69 

RATIO  OF  VECTOR  MEMORY  OPERANDS  TO  VECTOR  MEMORY  RESULTS  -  2.00 

RATIO  OF  VECTOR  UNIT  RESULTS  TO  VECTOR  MEMORY  OPERANDS  -  2.50 

RATIO  OF  VECTOR  UNIT  RESULTS  TO  VECTOR  MEMORY  RESULTS  -  5.00 

RATIO  OF  VECTOR  UNIT  CACHE  USE  TO  VECTOR  MEMORY  CACHE  USE  -  4.33 

VECTOR  MEMORY  RESULT  RATE  -  0.1637  RESULTS/CP 


Figure  2,5.4,  STAT  Examnle  (contd) 


The  CPACT  report  produces  a  detailed  clock  period  activity 

reford  of  a  Cray-1  uniprocessor  state.  This  is  a  132  column  report 

* 

suitable  only  for  printing  on  a  line  printer.  The  CPACT  report 
can  be  enabled  or  disabled  for  any  or  all  of  the  CPU's.  Figure 
2.5.4  on  the  following  page  shows  the  format  of  the  report.  Across- 
the  top  of  the  report,  the  various  column  headings  are  devoted  to  the 
Cray-1  resources  that  may  be  called  into  use  by  a  Cray-1  instruc¬ 
tion.  Time  flows  down  the  page  with  each  clock  period  of  simulation 
time  producing  an  output  record  that  describes  the  state  of  Cray-1 
resources  at  that  clock  period.  With  vector  instructions  using 
long  vector  lengths,  the  machine  resource  state  may  remain  unchanged 
for  fifty  or  more  clock  periods,  resulting  in  many  identical  CPACT 
output  records.  The  COMPRESS  option  on  the  CPACT  command  (see 
section  3)  may  be  used  to  suppress  the  printing  of  ten  or  more 
identical  output  records.  This  substantially  reduces  simulation 
cost  and  makes  the  CPACT  report  far  more  manageable.  One  line  of 
compression  dots  are  printed  in  place  of  the  suppressed  records. 

The  CPACT  report  is  partitioned  into  the  following  21  Cray-1 
resource  fields: 

1.  ST.  -  The  machine  state  field. 

This  field  indicates  the  machine  state  at  each  clock 
period.  Three  possible  entries  are:  (1)  "IS",  which 
means  that  an  instruction  is  issuing  at  this  clock 
period,  (2)  blank  which  means  that  no  instruction  will 
issue  at  this  clock  period,  and,  (3)  "FE",  which  means 
a  parcel  buffer  fetch  sequence  is  initiated  at  this 
clock  period. 

2.  TAG  -  The  activity  resource  tag. 

At  a  clock  period  in  which  a  new  machine  activity 
(instruction  issue  or  fetch  request)  is  initiated,  the 
activity  is  assigned  a  one  letter  activity  resource 
tag  (A-Z,  0-9)  which  is  used  in  subsequent  clock  periods 
to  identify  the  Cray-1  resources  called  into  use  by  the 
initiated  activity.  When  a  conflict  occurs  in  the  demand 

*1  SirrffiM'rsiSne  ?Lt&fA^SSmSlBlrSe?SiB!imuested  for  printing  at 
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for  a  Cray-1  resource,  the  tag  occupying  the  resource 
may  be  traced  back  to  the  initiating  activity. 

Resource  conflict  occurs  when  an  activity  initiated 
in  a  past  clock  period, occupies  a  Cray-1  resource 
that  is  now  being  demanded  by  another  activity.  For 
example,  when  an  arithmetic  instruction  issues,  the 
result  register  is  reserved  until  the  result  arrives 
at  the  register.  Because  the  Cray-1  is  pipelined, 
a  subsequent  instruction,  that  requires  the  previous 
arithmetic  result  as  an  input  operand,  may  experience 
an  operand  register  conflict,  causing  it  to  hold  issue 
until  the  previous  arithmetic  result  arrives  at  the 
operand  register  (the  previous  instruction's  result 
register) .  In  this  example  the  resource  conflict  occurs 
on  a  register.  The  CP ACT  report  will  show  the  first  in¬ 
struction's  activity  tag  in  the  report  column  corresponding 
to  the  result  register  of  the  instruction.  The  tag  will  re¬ 
main  in  this  column  until  the  result  register  reservation  ex¬ 
pires  (i.e.,  the  data  has  arrived).  If  the  second  instruction 
demands  the  use  of  this  result  register  before  the  reser¬ 
vation  has  expired,  the  result  register  reservation  tag 
will  be  underscored  and  the  second  instruction  will  hold 
issue  until  the  data  arrives. 

The  underscoring  of  activity  tags  is  used  throughout  the 
report  to  highlight  the  resource  conflicts  of  waiting 
instructions. 

3.  INSTRUCTION  -  The  mnemonic  for  the  issuing  instruction. 

When  a  Cray-1  instruction  issues  ("IS"  in  machine  state 
field) ,  the  assembly  mnemonic  for  the  instruction  is 
printed  in  this  column. 

4.  P-ADDR  -  The  parcel  address  of  the  issuing  instruction. 

When  a  Cray-1  instruction  issues,  the  parcel  address 
from  where  it  came  is  printed  in  this  column. 
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CP  -  The  simulator  clock  period. 

This  column  contains  the  simulator  clock  period.  The 
clock  period  is  reset  to  zero  by  an  INIT  command.  If 
the  user  turns  timing  on  and  off  through  the  SET  command 
or  the  ERT,  DRT  instructions,  the  clock  period  is  not 
affected  but  the  machine  resource  state  is  cleared. 

+*/£>+  -  The  Cray-1  vector  functional  units. 

Each  of  these  six  columns  represent  the  reservation 
state  of  a  Cray-1  Vector  functional  unit.  Left  to 
right  the  units  are  :  floating  point  adder,  floating 
point  multiplier,  floating  point  reciprocal  approxi¬ 
mation,  vector  logical,  vector  shift,  vector  integer 
adder.  The  activity  tag  of  a  vector  instruction  which 
reserves  one  of  these  functional  units  will  be  placed 
in  the  corresponding  column. 

The  vector  memory  path  can  also  be  reserved  by  a  vector 
instruction  and  is  shown  in  one  of  the  far  right  columns 
under  the  heading  "BSF",  which  stands  for  block  sequence 
flag.  This  flag  is  set  during  all  vector  memory  refer¬ 
ences 

V.  Reg  -  The  eight  Cray-1  vector  registers. 

Each  of  these  eight  columns  represent  the  reservation 
state  of  a  Cray-1  vector  register.  Vector  registers 
are  reserved  by  the  vector  instructions  which  reference 
them  either  as  operand  or  result  registers.  The  activity 
tag  of  the  issuing  vector  instruction  will  be  placed  in 
the  columns  corresponding  to  the  vector  registers  used 
by  the  instruction.  Operand  registers  are  typically 
reserved  for  MAX  (VL,5)  clock  periods.  Result  registers 
are  typically  reserved  for  MAX(VL,5)  +FUT  +  2  clock 
periods,  where  FUT  is  the  functional  time  of  the  vector 
unit  performing  the  vector  operation. 
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If  a  subsequent  vector  instruction  requires,  as  an 
input  vector,  the  result  vector  of  a  previous  vector 
instruction,  and  is  ready  to  issue  when  the  prior 
instruction's  first  result  arrives  at  it's  vector 
register  (the  first  result  will  arrive  in  FUT  +  2 
clock  periods  after  issue) ,  then  the  second  vector 
instruction  will  issue  only  at  the  clock  period  when 
this  first  result  arrives.  This  is  called  chaining 
and  the  clock  period  when  the  first  result  arrives 
is  called  chain  slot  time.  If  the  second  vector 
instruction  misses  chain  slot  time,  it  will  hold  issue 
until  all  results  of  the  first  instruction  have 
arrived  at  the  vector  register. 

If  the  second  vector  instruction  chains  to  the  first, 
the  activity  tag  of  the  second  instruction  will  replace 
the  tag  of  the  first  instruction  in  the  chained  vector 
register  column.  If  chain  slot  time  is  missed,  an 
asterisk  is  placed  in  the  result  register  field  at  the 
chain  slot  time  clock  period,  highlighting  chain  slot 
time. 

See  appendix-A  for  a  summary  of  Cray-1  timing  informa¬ 
tion. 

8.  MEMORY  BANKS  -  The  Cray-1  rank  registers  and  memory  banks. 

This  portion  represents  the  Cray-1  scalar  memory  refer¬ 
ence  access  network  and  the  16  Cray-1  memory  banks.  The 
memory  bank  cycle  time  of  the  Cray-1  is  four  clock  per¬ 
iods  long.  Consequently,  memory  accesses  to  the  same 
bank  must  be  at  least  four  clock  periods  apart.  Two  scalar 
memory  references  which  could  address  the  same  bank  can 
issue  two  clock  periods  apart.  This  would  give  rise  to  a 
bank  conflict  which  is  resolved  by  the  scalar  rank  register 
access  network.  (I/O  access  to  main  memory  also  passes 
through  the  rank  registers) .  The  four  columns  to  the  left 
of  the  16  memory  bank  columns  represent: 
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SCI  -  Scalar  in  clock  period  one. 

RKA  -  Rank  register  -  A 
RKB  -  Rank  register  -  B 
RKC  -  Rank  register  -  C 

When  a  scalar  memory  reference  issues,  it's  activity 
tag  is  placed  in  the  column  SCI.  It's  bank  address 
(lower  4  bits)  is  then  compared  to  the  bank  addresses 
in  rank  registers  A,  B  and  C.  If  a  bank  coincidence 
is  detected,  the  memory  address  waits  at  SCI  until  a 
clock  period  arrives  when  bank  coincidence  vanishes. 
Meanwhile  the  bank  addresses  in  the  rank  registers  are 
advanced  each  clock  period  to  the  next  rank  register. 

The  address  in  rank-C  advances  to  it's  target  memory 
bank  and  remains  latched  at  that  bank  for  four  clock 
periods.  On  the  fifth  clock  period,  the  memory  data 
is  gated  from  the  bank  into  the  SEC-DED  (single  error 
correction  -  double  error  detection)  network.  Simul¬ 
taneously  a  new  memory  address  may  be  latched  onto  the 
bank  to  start  the  next  reference. 

While  the  address  is  waiting  in  SCI,  the  activity  tag 
of  the  issuing  instruction  is  placed  in  one  of  the  far 
right  columns  labeled  "STH",  which  means  storage  hold. 
While  a  scalar  memory  reference  is  waiting  in  storage 
hold,  subsequent  scalar  memory  references  may  not  issue 
until  the  waiting  scalar  reference  leaves  the  storage 
hold  state.  This  means  that  two  scalar  memory  reference 
instructions,  accessing  the  same  bank,  may  issue  so  as 
not  to  block  subsequent  instructions  from  issuing.  But, 
if  a  third  scalar  memory  reference  tries  to  access  the 
same  bank  as  the  first  two  references,  the  storage  hold 
state  of  the  second  scalar  reference  will  block  issue 
of  the  third  scalar  reference  which  blocks  all  instruc¬ 
tion  issuing. 

As  the  memory  address  advances  through  the  rank  registers, 
the  activity  tag  of  the  issued  memory  reference  is  ad¬ 
vanced  to  the  right.  When  the  tag  leaves  rank-C,  it 
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will  jump  to  its  target  bank  and  remain  there  for 
four  clock  periods. 

Only  scalar  memory  references  place  tags  in  this 
section  of  CPACT.  Parcel  buffer  fetches,  B  and  T- 
register  transfers  and  vector  references  place  no 
tags  in  this  section.  They  do  affect  other  columns 
of  the  report,  though. 

ARA  -  The  A-register  access  path  busy  flag. 

There  is  a  single  store  access  path  to  the  eight 
Cray-1  address  registers.  Each  clock  period,  one 
operand  may  be  stored  into  one  of  the  eight  A-registers 
via  this  path.  When  an  instruction  tries  to  issue,  if 
the  result  of  it  s  computation  would  make  use  of  the 
A-register  access  path  at  a  future  clock  period  when 
the  path  is  already  reserved  for  use  by  a  prior  instruc¬ 
tion,  the  issue  will  be  held  until  the  next  clock  period. 
When  an  instruction  uses  the  access  path  its  activity 
tag  will  appear  for  one  clock  period. 

A.  Reg  -  The  eight  Cray-1  address  registers. 

These  eight  columns  correspond  to  the  eight  Cray-1 
A-registers.  When  an  instruction  reserves  an  A-register 
it§  activity  tag  will  appear  in  the  appropriate  column. 

SRA  -  The  S-register  access  path  busy  flag. 

This  flag  serves  the  same  function  for  the  eight  S-registers 
that  the  ARA  flag  does  for  the  A-registers. 

S. REG  -  The  eight  Cray-1  scalar  registers. 

These  eight  columns  correspond  to  the  eight  Cray-1 
S-registers.  When  an  instruction  reserves  an  S-register 
its  activity  tag  will  appear  in  the  appropriate  column. 
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13.  VM  -  The  vector  mask  busy  flag. 

This  flag  is  set  when  a  003  instruction  (VM  Sj)  or 
a  175  instruction  (VM  Vj,C)  is  in  progress.  The 
activity  tag  of  the  issuing  instruction  appears  in 
this  column. 

14.  AOB  -  AO  busy  flag. 

The  A-register  conditional  branch  instructions,  010- 
013,  use  the  data  in  A0  to  make  .  branch  decisions. 

When  new  data  is  stored  A0  it  takes  two  additional 
clock  periods  to  validate  the  branch  test  flags.  While 
the  branch  test  flags  are  invalid,  the  A-register  con¬ 
ditional  branch  instructions  will  hold  issue.  The 
activity  tag  of  the  instruction  storing  new  data  into 
A0  will  appear  in  this  column  until  the  branch  test 
flags  are  made  valid. 
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15. 


16. 
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SOB  -  SO  busy  flag. 

The  same  comments  for  AOB  above  apply  here,  except 
the  S-register  conditional  branch  instructions  (014- 
017)  are  affected. 


STH  -  Storage  hold  flag. 

The  activity  tag  for  a  scalar  memory  reference,  whose 
address  experiences  a  memory  bank  conflict  with  the 
rank  registers  is  placed  in  this  column  until  the  con¬ 
flict  vanishes.  Subsequent  scalar  memory  references 
will  hold  issue  until  this  flag  clears.  See  the  memory 
banks  discussion  for  more  details. 
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073  -  Vector  mask  read  inhibit  flag. 

Execution  of  a  003  instruction  (VM  Sj)  or  a  175  instruc¬ 
tion  (VM  Vj,C)  will  cause  a  073  instruction  (Si  VM)  to 
hold  issue  until  the  003  or  175  finishes.  The  activity 
tag  of  the  003  or  175  instruction  will  appear  in  this  column. 
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BCG  -  The  parcel  buffer  change  flag. 

When  the  next  instruction  parcel  to  enter  the  NIP 
(next  instruction  parcel)  register  is  not  in  the 
current  parcel  buffer,  due  to  a  branch  or  a  buffer 
fall  through,  this  column  will  be  tagged  with  an 
asterisk  until  the  buffer  change  is  completed.  This 
may  involve  a  switch  to  one  of  the  other  parcel  buffers 
or  a  parcel  buffer  memory  fetch  may  be  needed.  This 
flag  causes  all  instructions  to  hold  issue. 

FPA  -  Fetch  pause  flag. 

This  flag  is  used  to  trigger  a  parcel  buffer  fetch 
sequence.  While  it  is  up, an  asterisk  appears  in  this 
column.  The  fetch  sequence  will  begin  when  this  flag 
is  clear. 

BSF  -  The  vector  memory  reference  block  sequence  flag. 

When  a  vector  memory  reference  (176,177)  issues,  this 
column  will  be  set  with  the  activity  tag  of  the  issuing 
instruction.  Subsequent  vector  memory  references  will 
hold  issue  until  this  flag  clears. 

BTX  -  The  B  and  T  register  block  transfer  flag. 

When  a  B  or  T  register  block  transfer  instruction  (034- 
037)  issues,  its  activity  flag  will  appear  in  this 
column.  No  other  instruction  may  issue  while  a  B  or  T 
block  transfer  is  in  progress. 


2.5.2  CPACT  Examples 

To  illustrate  the  use  of  the  various  CPACT  report  fields  two 
examples  are  presented,  a  scalar  memory  reference  example  and  a 
vector  example. 

2.5. 2.1  Scalar  Example 

The  CPACT  report  shown  in  figure  2.5.5  (wide  and  narrow  versions) 
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the  CPACT  report.  These  tags  will  be  used  to  refer  to  the  instruc¬ 
tions  in  the  discussion  below. 

When  the  program  is  first  started  up,  no  instruction  parcels 
are  in  the  parcel  buffers  so  a  fetch  is  required.  The  asterisk 
in  the  BCG  field  indicates  a  buffer  change  in  process.  When  the 
first  instruction  parcel  (20A)  arrives  at  the  parcel  buffer,  two 
pass  instructions  ("<BLANK>")  issue  which  pull  the  parcel  through 
the  NIP  to  the  CIP  register.  Instruction  B  issues  at  clock  period 
17  (CP  17)  and  its  activity  tag  appears  in  SCI  to  indicate  a  scalar 
memory  reference  in  clock  period  one.  As  time  advances  the  B  tag 
propagates  through  the  rank  registers  and  onto  memory  bank  13  octal 
(B  hexadecimal) .  After  four  clock  periods  on  the  memory  bank,  the 
B  tag  vanishes  and  appears  later  at  clock  period  27.  Here  the 
instruction  makes  use  of  the  S-register  access  path  so  the  memory 
data  can  be  stored  into  the  result  register,  S3  in  this  case. 
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Figure  2.5.6  -  CPACT  Scalar  Example,  Wide  version 
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During  the  time  period  from  instruction  issue  (CP  27)  until  the 
data  arrives  at  the  result  register  (CP  27) ,  the  B  tag  appears 
in  the  S-register  3  column  showing  the  reservation  on  S3.  Because 
a  scalar  memory  reference  instruction  is  a  two  parcel  instruction, 
a  <BLANK>  Will  issue  after  it.  This  is  true  for  all  two  parcel 
instructions. 

When  scalar  reference  instruction  C  issues,  a  bank  conflict 
with  the  previous  instruction  is  detected  (address  33  and  53  are 
in  the  same  bank) .  This  causes  the  reference  to  enter  the  storage 
hold  state  until  the  conflict  vanishes,  meanwhile  the  C  tag  appears 
in  the  STH  column.  Scalar  reference  instruction  D  will  try  to  issue 
at  CP  21,  but  can't  because  the  storage  hold  flag  is  set.  This 
is  noted  by  the  underscore  beneath  the  C  tag  in  the  STH  column. 

The  subsequent  four  scalar  references  (D,E,F  and  G)  all 
reference  available  memory  banks  and  issue  consecutively  without 
conflict.  Instruction  E  loads  register  SO  which  is  needed  by 
branch  instruction  H  to  decide  the  branch  outcome.  Even  though 
the  data  for  E  arrives  at  CP  35,  two  more  clock  periods  are  required 
until  the  branch  condition  flags  become  valid.  While  the  branch 
flag  is  invalid  the  E  tag  appears  in  the  SO  busy  column  (SOB) .  Once 
the  SOB  flag  clears,  the  branch  instruction  issues  at  CP  38.  This 
branch  instruction  has  an  in-buffer  target  address.  While  the 
buffer  change  is  in  progress,  the  instruction  causing  the  change 
will  place  its  tag  in  the  BCG  column.  Once  the  change  is  complete, 
BCG  is  cleared  and  two  blanks  issue  to  load  the  CIP  register. 
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2. 5. 2. 2  Vector  Example 

This  example  illustrates  the  CPACT  report  with  vector  instruc¬ 
tions.  All  vector  lengths  are  seven.  Figure  2.5.6  is  the  CPACT 
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As  in  the  scalar  example,  the  simulation  begins  with  a  fetch  ^ 

sequence.  The  first  vector  instruction  (F)  loads  a  vector  from 
memory  into  VO.  Rank-B  and  rank-C  busy  are  hold  issue  conditions 
for  this  vector  load  and  are  shown  underscored.  Once  the  vector 
load  issues,  it  places  its  tag  both  in  the  VO  busy  and  the  block 
sequence  flag  (BSF)  columns.  Vector  instruction  G  is  also  a  vector 
memory  load,  but  it  must  hold  issue  until  BSF  clears.  The  asterisk 
in  the  VO  column  at  clock  period  33  represents  the  chain  slot  clock 
period  for  instruction  F.  If  a  vector  instruction  using  VO  as  an 
operand  was  placed  after  instruction  F  and  it  meets  all  other  con¬ 
ditions  at  CP  33,  it  will  issue  at  CP  33,  and  be  chained  to  F.  At 
CP  35  BSF  clears,  allowing  instruction  G  (a  second  vector  memory 
load)  to  issue. 

Instruction  H  is  a  vector  integer  add  with  vector  registers  VI 
and  VO  as  operands.  Consequently,  the  busy  state  of  VI  and  VO  are  S3 

hold  issue  conditions  for  H.  The  functional  unit  time  for  the  vector 
load  (G)  is  seven  clock  periods,  so  at  CP  44  ,  (issue  time)  +  (functional;.-; 
unit  time)  +  (2) ,  chain  slot  time  will  occur  and  the  vector  add 
issues.  The  tag  for  the  chaining  instruction  (H)  replaces  the  result  ^ 
tag  (G)  on  the  chained  register. 
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Figure  2.5.  7  -  CPACT  Vector  Example 
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Instruction  I  chains  in  a  similar  way  to  the  result  of  instruc¬ 
tion  H.  The  vector  mask  register  is  reserved  by  I  and  its  tag  is 
placed  in  the  VM  column. 

Instruction  J  will  hold  issue  until  the  073  inhibit  flag  clears. 
SO  is  used  as  data  for  branch  instruction  K  which  does  a  branch  out 
of  buffer,  causing  a  fetch  sequence. 


S' 
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2.5.3  TRACE  Report 

The  TRACE  report  is  an  instruction-by-instruction  flowtrace 
of  the  program  being  executed.  As  each  instruction  issues,  its 
instruction  affects  are  displayed.  The  TRACE  report  can  be  enabled 
or  disabled  for  any  or  all  of  the  CPU's.  The  TRACE  report,  unlike 
the  STAT  report,  is  independent  of  the  TIMING  switch.  The  format 
of  the  TRACE  command,  which  is  somewhat  complicated,  will  not  be 
discussed  here  —  see  instead  Chapter  3  of  this  report.  Here  will 
be  presented  some  examples  of  the  use  of  the  TRACE  command,  the  sample 
code  used  is  the  same  as  was  used  in  the  STAT  example  (section  2. 5. 1.1). 


The  TRACE  report  produces,  for  each  instruction,  one  or  more 
printed  lines.  The  output  for  each  instruction  is  divided  into  three 
fields:  address,  mnemonic  and  display;  the  fields  are  separated  by 

colons.  The  address  field  contains  the  parcel  address  of  instruction, 
and  the  mnemonic  field  contains  the  CAL  (Cray-1  Assembly  Language) 
instruction  mnemonic  of  the  instruction.  Thus,  looking  at  (1)  in 
the  figure,  we  see  that  at  parcel  address  20B  is  the  CAL  instruction 
A2  7750,  AO  (opcode  lOh) .  Note  that  all  constants  displayed  in  both 
address  and  mnemonic  fields  are  octal  constants,  regardless  of  what 
BASE  pseudo-op  was  used  for  assembly  of  the  source  code.  Therefore, 
since  7750g  =»  4072^g,  the  memory  reference  will  not  exceed  the  ad¬ 
dress  space  (4096^0  words)  provided  that  -4072iq  <  AO  <  231Q. 

The  third  field  in  the  TRACE  output  line  (or  lines)  is  the  dis¬ 
play  field.  In  the  display  field  is  printed  the  contents  of  storage 
locations  affected  by  the  instruction.  The  results  displayed  are 
results  after  the  instruction  has  exectued;  that  is,  the  results 
which  the  instruction  produced.  In  (1),  this  means  that  A0=0  and 
that  the  word  stored  in  A2  after  being  fetched  from  memory  is  also 
zero.  For  every  instruction,  the  result  register  is  displayed  unless 
the  instruction  is  a  cache  transfer  instruction  (opcodes  025  and  075)  , 
in  which  case  the  contents  of  the  A  or  S  register  transferred  into 
the  B  or  T  cache  is  displayed  rather  than  the  actual  S  or  T  result 
register.  As  an  example,  see  (2);  here  the  contents  of  A4  rather 
than  the  contents  B05  are  displayed. 

All  displayed  results  are  decimal  integers  or  decimal  floating 
point  numbers.  In  (3),  the  result  is  shown  to  be  2023^Q.  Exceptions 
to  the  decimal-display  rule  are  the  following: 


i 


Ml 
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1)  S  registers  are  displayed  as  both  decimal  integers  and 
decimal,  floating-point  numbers  unless  the  instruction 

is  a  logical  instruction  (shift,  mask,  AND,  etc.)  in  which 
case  the  result  is  displayed  as  a  64-bit  octal  constant. 

2)  V  registers  are  displayed  as  octal  constants  for  logical 
instructions,  and  decimal  floating-point  numbers  otherwise 
(contrast  (4)  and  (5)  ). 

Vector  instructions  display  a  number  of  elements  which  is  equal 
to  the  minimum  of  VL  and  the  number  specified  by  the  optional  LEN 
parameter  on  the  TRACE  command.  Note  that  in  (4)  and  (5)  all  64 
elements  are  printed  since  VL=64  and  no  LEN  parameter  was  supplied. 
(6)  illustrates  a  TRACE  command  with  a  LEN  parameter  given;  since  a 
LEN  is  given,  (7)  (which  is  the  same  instruction  (4))  displays 
only  four  elements  of  the  vector  result  register. 


.  MAP 
MODULE 
MMUL 


LOCATION 

20A 


LENGTH 

202 


TO 


^  V  *  j  v*  >  -1 


trace  on 

RU  *25 


20A 
20B 
20D 
— *►  21B 
21C 
— *■  21D 
22B 
22C 
23A 
23B 
23C 
23D 
24A 
24B 
-*>  24C 
V3(  0)« 


AO 
A2 
A4 
BOS 
A4 
A7 
B02 
A7 
B04 
AS 
B20 
VL 
AX 
B07 
V3 
O'O' 


O'O' 

O'O' 

O'O' 

O'O' 

O'O' 

O'O' 

O'O' 


V3(  3)  =  O'O' 
V3(  4)*  O'O' 
V3(  9)-  O'O' 
V3( 12)*  O'O' 
V3(15)=  O'O' 
V3(18)=  O'O' 
V3(21 )*  O'O' 
V3(24)»  O'O' 
V3(27)»  O'O' 
V3(30)> 

V3( 33)= 
V3(36)> 
V3(39)= 
V3(42)* 
V3(45)= 

V3( 48)* 
V3(51)  =  O'O' 
V3(54)=  O'O' 
V3(57>«  O'O' 
03 ( 60 ) ®  O'O' 
V3(63)»  O'O' 
-*>  24D  :  00 

00<  0>-  0.0 
00<  3)-  0.0 
00<  6)  =  0.0 
00<  ?)■ 

00< 12)* 
V0(15)= 
00(18)= 
00(21)= 
00(24)= 
00(27)= 
00(30)= 
00(33)= 
00(36)* 
00(39)* 


00 

7750.  AO 

7751.  AO 
A4 

A4*A2 

3747 

A7 

5747 

A7 

01 

AS 

A4 

00 

A1 

S0&U4 

03(  1)  =  O'O 
03 (  4 ) *  O'O 
03(  7 )  =  O'O 
03(10)=  O'O 
03(13)=  O'O 
03(16)=  O'O' 
03(19)=  O'O' 
03(22)=  O'O' 
03(25)=  O'O' 
03(28)=  O'O' 
03(31)=  O'O' 
03(34)=  O'O' 
03(37)-  O'O' 
03(40)=  O'O' 
03(43)=  O'O' 
03(46)=  O'O' 
03(49)=  O'O' 
03(52)=  O'O' 
03(55)=  O'O' 
03(58)=  O'O' 
03(61)=  O'O' 


S0+F03 


0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 


«  IA  / 


V0(  1>« 
V0(  4 )* 
U0(  7>= 
00(10)= 
00(13)= 
00(16)= 
00(19)* 
00(22)* 
00(25)* 
00(28)* 
00(31)= 
00(34)- 
00(37)* 
00(40)* 


AO-  0 

A0=  0  A2=  0  = 

A0=  0  A4=  0 

A4=  0- - (f) 

A4-  0 


A7-  2023= - Q> 

A7=  2023 
A7-  3047 
A7=  3047 
AS-  1 
AS-  1 
OL-64 
Al=  0 

Al=  0  ^ 

OL-64  - 0 

03 (  2>=  O'O' 

03(  5)=  O'O' 

03(  8>=  O'O' 
03(11)=  O'O' 
03(14)=  O'O' 
03(17)=  O'O' 
03(20)=  O'O' 
03(23)=  O'O' 
03(26)=  O'O' 
03(29)=  O'O' 
03(32)-  O'O' 
03(35)-  O'O' 
03(38)-  O'O' 
03(41)-  O'O' 
03(44)=  O'O' 
03(47)=  O'O' 
03(50)=  O'O' 
03(53)=  O'O' 
03(56)=  O'O' 
03(59)=  O'O' 
03(62)=  O'O' 


0L» 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 
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V0(  2)  = 
V0(  5)> 
V0(  8)> 
VO(ll). 
V0( 14  )> 
V0(17>* 
V0(20>* 
VO (23)* 
VO (26) * 
VO ( 29  >  * 
V0(32>« 
VO (35)* 
V0<  38  >* 
V0( 41 )* 


I  IA  .  *  ^  V  .  A  A 


Figure  2.5.8.  TRACE  Example 
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<S> 


o.o 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 

0.0 


.s' 

>>! 

V  •••! 


gj  « 


TO  V* 
M  -N 
sS? 

tv  v> 


53 

5?  -v 
$  M 


v>v 

vo<43)-  o.o 
V0<48)-  0.0 
V0<51 )-  0.0 
V0<54)-  0.0 
V0<57)-  0.0 
V0<40)-  0.0 
V0<43>-  0.0 


wita;«  v/.w 

V0<46)-  0.0 
V0<49)-  0.0 
V0<32>«  0.0 
V0<55)-  0.0 
V0<58)-  0.0 
V0<61)-  0.0 


25A 

• 

• 

A6 

B02 

:  A6- 

2023 

258 

• 

• 

A4 

A6+A4 

:  A6- 

2023 

25C 

♦ 

» 

A7 

AO-AO 

:  A7--1 

25D 

« 

• 

AO 

A0+A6 

t  AO- 

2023 

26A 

« 

• 

VI 

? A0?A7 

S  AO- 

2023 

Vl<  3)-  0.0 
Vl<  4)-  0.0 
Vl<  9>-  0.0 
Vl<12)»  0.0 
Vl<15)-  0.0 
Vl<18)«  0.0 
Vl(21>-  0.0 
Vl<24>-  0.0 
Vl<27>-  0.0 
Vl<30>-  0.0 
Vl<33)-  0.0 
Vl<36)-  0.0 
Vl<39)-  0.0 
Vl<42)-  0.0 
Vl<43)-  0.0 
Vl<48)-  0.0 
VI <51 )-  0.0 
Vl<34>-  0.0 
Vl<37>«  0.0 
Vl<60>-  0.0 
Vl<63>-  0.0 
26B  J  B< 


26C  : 
260  t 
278  J 


0.0 

0.0 

0.0 

0.0 

0.0 

802  A6 
A3  01 
A7 

803  A7 


Vl<  1)-  0.0 
Vl<  4)-  0.0 
Vl<  7>*  0.0 
VI < 10)*  0.0 
VI < 13)*  0.0 
VI < 16)*  0.0 
VI ( 19)-  0.0 
VI < 22 ) -  0.0 
Vl<23)*  0.0 
Vl<28>-  0.0 
VI <31 )*  0.0 
VI <34>*  0.0 
VI < 37)-  0.0 
VI < 40)-  0.0 
Vl<43)-  0.0 
Vl<46>-  0.0 
VI < 49)-  0.0 
Vl<32>-  0.0 
Vl<55>-  0.0 
VI <58>-  0.0 
VI <61 )-  0.0 


**  INSTRUCTION  ISSUE  LIMIT  EXCEEDED  AT 
.  TRACE  OFF 

.  TRACE  ON  LEN-4  - - (& 

.  RU  20A  #25 

20A  J  AO  00  j  AO-  0 

208  1  A2  7750?  AO  !  AO-  0  4 

20D  1  A4  7751? AO  l  AO-  0  A 

218  1  805  A4  t  A4*  0 

21C  1  A4  A4XA2  J  A4-  0 

21D  S  A7  3747  :  A7-  2023 

22B  !  802  A7  J  A7-  2023 


ViV 

V0( 47>«  0.0 
V0(50>-  0.0 
V0<33)«  0.0 
V0<56)»  0.0 
V0<59>-  0.0 
V0<62>-  0.0 


A7--1 
Vl<  2) 
Vl<  5) 
Vl<  8) 
VI  <  1.1  > 
Vl(14> 
Vl<17> 
Vl<20) 
VI  (23)’ 
Vl<26>< 
VI  (29)' 
VI  ( 32)' 
Vl(35)' 
Vl<38)< 
VI  <  41  )■ 
Vl<44)> 
VI  <47>< 
VI  <50)> 
VI <33 ) = 
Vl(56)> 
VI (39>« 
Vl<62)» 


u  a 


208  t 
20D  t 
218  1 
21C  J 
21D  : 
22B  i 
22C  : 
23A  t 
23B  t 
23C  t 
23D  J 
24A  1 
248  t 


A7 

BOA  A7 
A5  01 
B20  A5 
VL  A4 
A1  00 
B07  A1 


A7-  2< 
A7-  2< 
A7-  3< 
A7-  3( 
AS-  1 
A5-  1 
VL-44 
Al-  0 
Al-  0 


VL-44 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
■  0.0 


Figure  2.5.8.  TRACE  Example  (cont'd) 


— *■  24c  :  V3  sotV4  :  vu-64  •* -  (z) 

V3<  0>"  O'O'  V3<  1>«  O'O'  V3<  2>"  O'O' 

V3<  3)-  O'O' 

240  1  VO  S0+FV3  i  VL-64 

V0<  0>»  0.0  V0<  1>-  0.0  V0<  2>«  0.0 

V0<  3)-  0.0 


25A  : 

A6 

B02 

A6- 

2023 

23  B  : 

A6 

A6+A4 

A6- 

2023 

2sc  : 

A7 

AO-AO 

A7»-l 

2SD  : 

AO 

A0+A6 

AO" 

2023 

26A  : 

VI 

t A0»A7 

AO- 

2023 

Vl<  0»  0 

.0 

Vl<  1 

-  0.0 

Vl<  3)«  0 
260  : 

.0 

002 

A6 

A6- 

2023 

26C  t 

A3 

01 

A3- 

1 

26D  : 

A7 

1747 

A7- 

999 

27B  : 

003 

A7 

A7- 

999 

A7--1  VL-64 

Vl<  2)*  0.0 


**  INSTRUCTION  ISSUE  LIMIT  EXCEEDED  AT  27C 


.  TRACE  OFF 


Figure  2.5.8.  TRACE  Example  (cont'd) 


3 


i 


S 


i! 


kWA  'A  '.’."■’i  re  a  ."- Tr^r^rr 


2.5.4  TACT  STAT  Report 

The  Task  ACTivity  STATistics  report  is  a  condensed  table  of 
all  tasking  activity  since  tasking  was  turned  on  (SET  TASK=ON) . 
Each  column  of  the  report  shows  how  much  time  (in  clock  periods) 
each  CPU  spent  executing  each  task.  The  percentage  of  the  total 
time  since  timing  was  turned  on  is  shown  beneath  the  number  of 
clock  periods.  The  last  column,  labelled  'TOT' ,  is  the  total 
amount  of  processor  time  spent  in  each  task. 

Each  row  of  the  report  shows  the  time  each  processor  spent 
executing  a  particular  task.  The  last  row  depicts  the  time  each 
CPU  spent  executing  the  tasks.  The  last  entry  in  the  last  row 
represents  the  overall  task  concurrency.  If  all  the  CPU's  spent 
all  of  their  time  executing  defined  tasks,  this  figure  would  be 
the  number  of  clock  periods  multiplied  by  the  number  of  CPU's. 
For  example,  if  three  CPU’s  spent  forty  percent  of  their  time 
executing  defined  tasks,  the  total  concurrency  percentage  would 
be  %120  out  of  a  total  possible  of  %300. 

Figure  2. 5. 7.1  is  a  TACT  STAT  report  from  simulation  of  a 
four-processor  sparse  matrix  triangular  factorization. 

TASK  STATISTICS 


TASK 

CPU  1 

CPU  2 

CPU  3 

CPU  4 

TOT 

FAC1 

84 

0 

0 

0 

84 

1.287. 

0.0  X 

0.0  X 

0.0  X 

1.28X 

J0IN1 

349 

0 

0 

0 

349 

3.307. 

0.0  X 

0.0  X 

0.0  X 

3.  SOX 

MUL2 

0 

0 

0 

0 

0 

0.0  7. 

0.0  X 

o.o  % 

0.0  X 

0.0  X 

JQIN4 

18 

0 

0 

0 

18 

0.277. 

0.0  X 

0.0  X 

0.0  X 

0.27X 

JOINS 

0 

0 

0 

0 

0 

0.0  7. 

0.0  7. 

0.0  X 

0.0  X 

0.0  X 

SOLI 

323 

0 

0 

0 

323 

4.847. 

0.0  X 

0.0  X 

0.0  X 

4.84X 

mul: 

0 

0 

0 

0 

0 

0.0  7. 

0.0  7. 

0.0  7. 

0.0  X 

0.0  X 

MU- 

0 

0 

o 

0 

0 

0.0  7. 

0.0  X 

0.0  7. 

0.0  X 

0.0  X 

SOL 

1310 

1310 

1310 

1310 

3240 

14.327. 

19.32X 

19.327. 

19.327. 

78.08X 

FAC 

0 

380 

0 

0 

380 

0.0  7. 

3.44X 

0.0  X 

0.0  X 

3.44X 

TOTAL 

2108 

1490 

1310 

1310 

4418 

31.417. 

23. 187. 

14.327. 

19.32X 

9S.43X 

TOTAL  CLOCKS:  4711 
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Figure  2. 5. 7.1:  Sample  TACt  STAT  Report 
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2.5.5  The  TACT  Report 

A  task  is  an  identified  group  of  instructions.  It  must  have 
unique  entry  and  exit  point.  A  name  is  associated  with  each  task 
that  is  defined.  When  a  processor  passes  through  a  task  entry  point, 
the  name  of  the  task  is  placed  in  that  processor's  column  on  the 
TACT  report.  The  name  of  the  task  remains  in  the  processor's  column 
until  the  processor  passes  through  the  exit  point  of  that  task,  at 
which  point  the  column  entry  is  cleared. 

The  Task  Activity  Report  is  a  detailed  listing  of  all  tasking 
activity,  similar  to  the  CPACT  Report.  The  TACT  Report  can  be 
thought of  as  a  macro  CPACT  Report,  the  CPU's  representing  functional 
units  and  tasks  representing  instructions. 

To  enable  task  information  collection,  the  command  "SET  TASK*ON " 
is  issued  from  the  command  language.  The  simulator  then  prompts  for 
a  file  containing  the  task  definitions  for  the  programs  being  simu¬ 
lated  (for  a  sample  task  definition  file,  see  Appendix  K) .  To  enable 
TACT  Reporting,  the  command  "TACT  filename"  is  then  issued  where 
"filename"  is  the  name  of  the  file  to  receive  the  TACT  report  output. 
The  program  is  then  run  in  the  usual  fashion. 

The  sample  report  shown  in  Figure  2. 5. 7. 2  is  for  a  blocked 
randomly  sparse  symmetric  matrix  factorization. 
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TASK  ACTIVITY  REPORT 


> 

> 


>  *  axe  sparse 

> 

> 


> 

CP  CPU  1 

CPU 

2  CPU 

3  CPU 

4 

> 

> 

• 

> 

• 

a 

> 

• 

> 

2150! 

1 

1 

1 

1 

> 

22001 J0IN4 

1 

1 

< 

1 

1 

> 

2218! 

I 

1 

1 

1 

> 

22S0  i 

1 

1 

• 

1 

1 

> 

23301 

! 

» 

1 

• 

1 

1 

> 

2389  IF AC 1 

1 

t 

1 

• 

1 

1 

> 

24301  FAC 1 

: 

1 

1 

1 

1 

>• 

2473! 

! 

1 

• 

1 

> 

2313! 

I  FAC 

1 

1 

1 

> 

23301 

IF  AC 

» 

1 

• 

» 

* 

1 

> 

2373! JO INI 

1  FAC 

1 

1 

1 

1 

> 

26301 J0IN1 

1  FAC 

1 

1 

1 

> 

27501 J0IN1 

!  FAC 

1 

1 

( 

1 

> 

2850! JOIN 1 

1  FAC 

1 

1 

1 

> 

28931 J0IN1 

1 

1 

1 

1 

> 

2942! 

1 

1 

1 

1 

1 

> 

2943! SOLI 

1 

1 

1 

1 

> 

29301  SOLI 

1 

1 

1 

1 

> 

30301  SOLI 

1 

1 

1 

1 

> 

30601 

I 

1 

i 

1 

1 

> 

3090! 

!  SOL 

1 

1 

1 

> 

31301 

!SOL 

1 

1 

1 

> 

3173!S0L1 

!SOL 

t 

I 

> 

32301  SOLI 

!  SQL 

« 

1 

1 

1 

> 

32331 

!  SOL 

• 

1 

1 

1 

1 

> 

3293! 

!SOL 

ISOL 

1 

1 

1 

1 

> 

3350! 

!  SQL 

ISOL 

» 

1 

I 

> 

3361 1  SOLI 

ISOL 

ISOL 

1 

1 

> 

3427! 

!SOL 

ISOL 

* 

1 

1 

> 

3450! 

!  SQL 

SSOL 

I 

» 

1 

> 

346B ! 

ISOL 

!  SOL 

ISOL 

• 

1 

> 

3533! SOLI 

!SOL 

ISOL 

ISOL 

I 

> 

3530! SOLI 

ISOL 

!  SOL 

ISOL 

1 

> 

35931 

ISOL 

ISOL 

ISOL 

1 

> 

36301 

ISOL 

ISOL 

ISOL 

1 

> 

37501 

ISOL 

ISOL 

ISOL 

1 

> 

3850! 

ISOL 

ISOL 

ISOL 

l 

> 

39501 

!SOL 

ISOL 

ISOL 

I 

> 

3990! SOL 

!  SOL 

ISOL 

ISOL 

1 

> 

4050! SOL 

ISOL 

ISOL 

ISOL 

1 

> 

4 1 50 !  SOL 

ISOL 

ISOL 

ISOL 

1 

> 

4230 1  SOL 

ISOL 

ISOL 

ISOL 

1 

> 

4330 ! SOL 

ISOL 

ISOL 

ISOL 

1 

> 

44001 SOL 

1 

ISOL 

ISOL 

1 

> 

4430 ISpL 

I 

ISOL 

ISOL 

1 

> 

4550 1  SOL 

1 

ISOL 

ISOL 

1 

« 

> 

4603 ! SOL 

1 

1 

ISOL 

1 

> 

4630 1  SOL 

1 

1 

ISOL 

1 

> 

47301  SOL 

1 

1 

ISOL 

1 

> 

47781  SOL 

1 

1 

1 

• 

1 

> 

4830 ! SOL 

I 

( 

1 

1 

1 

> 

4930 1  SOL 

I 

# 

1 

1 

» 

> 

SOSO  1  SOL 

1 

1 

1 

l 

1 

> 

S1S0IS0L 

1 

1 

1 

1 

1 

> 

5230 1  SOL 

1 

1 

1 

1 

> 

3300! 

1 

1 

1 

1 

« 
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Figure  2 . 5 . 7 . 2  Sample  TACT  Report 
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2. 6  Inconsistencies  with  the  Cray-1 

In  this  section,  simulator  behavior  that  is  known  to  be  *4 

inconsistent  with  the  Cray-1  will  be  discussed. 

f  £ 

1)  The  simulator  does  not  simulate  Cray-1  I/O.  v'- 

2)  No  Cray-1  monitor  instructions  are  simulated.  ig 

3)  The  Cray-1  exchange  mechanism  is  not  simulated.  •* 

4)  Recursive  use  of  vector  registers  is  not  supported 

5)  The  simulator  floating  point  format  (IBM  360/370) 
differs  from  the  Cray-1  format.  (See  section  2.1.3) 

6)  The  071X2X  instruction  (Si  +AK)  produces  a  normalized  / 

result  in  the  simulator.  Not  so  on  the  Cray-1. 

7)  Though  the  timing  of  sizeable  algorithms  has  been  close  ^ 
to  the  Cray-1  timing,  with  an  error  in  the  1/2%  range, 
it  is  not  exact.  / 
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3. 


Simulator  Command  Descriptions 

This  section  describes  each  of  the  simulator  commands  in  detail. 
Each  command  may  be  abbreviated  and  the  minimum  acceptable  abbrevia¬ 
tion  is  underlined.  Each  command  description  has  the  following  format 

(1)  Purpose  -  The  function  of  the  command. 

(2)  Prototype  -  The  parameter  syntax  for  the  command. 

(3)  Description  -  A  detailed  description  of  the  command. 

(4)  Examples. 


The  following  syntax  is  used  to  describe  the  commands: 

Upper  case  characters  must  appear  exactly  as  shown. 

Lower  case  characters  represent  generic  parameter  names  which 
must  be  replaced  with  the  actual  parameters. 

Where  blanks  appear  one  or  more  blanks  must  appear. 

Square  brackets  are  used  to  denote  optional  parameters. 

Ellipsis  notation  (...)  is  used  to  denote  the  repetition  of  a 
parameter  list. 

Vertical  bars  are  used  to  separate  parameter  alternatives. 

All  commands  must  be  less  than  or  equal  to  80  characters  in 
length.  However,  a  simulator  subroutine  call  may  pass  a  segmented 
set  of  commands  whose  combined  length  may  not  exceed  200  bytes.  Each 
individual  command  though  may  not  exceed  80  bytes. 


r 


Command  Summary 


BREAK 

CALCULATE 

CHANGE 

CLEAR 

COMMENT 

COST 

CPACT 

DEFINE 

DISPLAY  [@fmt-code] 

DUMP 

ENDFILE 

HELP 

IDENT 

INIT 

LOAD 


REMOVE 

RETURN 


STAT 

STOP 

TRACE 


$MTS-command 


ENABLE 

DISABLE 

TACT 


p-addr  [skip-cnt] 
p-addr  [skip-cnt] 
expression 
symbol  new-value 
[p-addr  ...] 
any  text 

[fdname  [COMPRESS  |  NOCOMPRESS]  [WIDE 
symbol  constant [ ,w( , p | ,v] 
symbol  [.length] 

[module  name] 

command-name 

module-name 

[s.a.]  fdname  ... 

[XREF] 

mts-command 

symbol 

[p-addr]  [tissue-limit] 
lhs=rhs  . . . 

[FULL] 

ON | OFF  [fdname]  [LEN  =  VL| trace  length] 
fdname  [NOECHO] 

CPU  number 
CPU  list 
CPU  list 
STAT I fdname 


NARROW]  ] 
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AT 

Command  Description 

Purpose  :  To  set  an  AT  point  at  a  selected  parcel  address 

Prototype  :  AT  p-addr  [skip-count] 

”*  • 

I  commands  to  process  when  the  AT  point  is  hit. 

END 

Description: 

An  AT  point  is  set  at  the  specified  parcel  address.  This  address 
may  be  in  modified  octal  format,  which  is  the  octal  word  address  follow 
ed  by  a  parcel  code  A,B,C,  or  D,  or  may  be  a  symbol  with  parcel  addres: 
attribute  defined  by  the  assembly  language  program.  After  the  AT  com¬ 
mand  is  entered,  the  command  language  will  read  more  command  input. 
These  commands  are  written,  unprocessed,  into  a  scratch  file.  /  a 
maximum  of  9  commands  can  be  accepted.  To  terminate  input,  enter 
the  string  "END"  on  a  single  line.  AT  points  set  at  the  lower  parcel 
of  a  two  parcel  instruction  are  ignored. 

When  the  AT  point  is  hit,  the  scratch  file  will  be  opened  and 
subsequent  commands  read  from  that  file.  The  AT  file  is  terminated 
with  a  RUN  command  which  will  resume  the  simulation  automatically. 

To  regain  control  when  the  AT  point  is  hit,  you  must  enter  the 
command  USE' *MSOURCE*  '  when  setting  up  the  AT  point. 

An  optional  skip-count  may  be  provided  when  first  setting  the 
AT  point.  This  is  a  positive  decimal  number  which  indicates  the 
number  of  times  the  simulator  is  to  ignore  the  presence  of  this  AT 
point.  When  this  count  expires,  the  AT  point  will  be  recognized 
and  processed  as  above. 

When  the  AT  point  is  hit,  the  instruction  at  the  p-addr,  where 
it  is  set,  will  not  have  been  executed. 

Examples : 

AT  21a 

DISPLAY  SO  AO  M(33) 

END 

When  the  AT  point  is  hit,  registers  SO  AO  and  memory 
location  33  octal  are  displayed. 


AT  245B  31 
D  M( 0)  ,10 

USE  *MSOURCE*  — 

if 

END  ™ 

An  AT  point  is  set  at  location  245B.  The  first  31  (decimal) 
times  the  instruction  is  executed,  the  AT  point  is  skipped.  Befor  •*% 
the  instruction  is  executed  again  the  AT  point  takes  control  and 
displays  memory  locations  through  10  (octal) .  Control  is  then 
given  to  the  user  terminal. 

AT  SUBRI 
CHANGE  A5  A7 
END 

This  has  the  effect  of  patching  a  new  instruction  (A5  A7) 
at  location  with  label  SUBRI  (i.e.,  A7  is  stored  into  A5) . 


Error  Responses: 

Invalid  p-addr  - 

The  p-addr  is  unrecognizable  or  out  of  range  of  the  current 
memory  size. 

Undefined  Symbol  - 

The  symbol  specified  is  not  defined  in  the  current  (IDENT) 
module. 

Symbol  does  not  reference  a  parcel  address  - 

The  symbol  has  word  address  or  value  attribute. 

Invalid  Skip  count  - 

Skip  count  unrecognizable  or  negative. 

No  room  for  more  Break  or  At  points  - 

Only  forty  Break  or  At  points  may  be  set  at  any  one  time. 
Break  or  At  point  already  set  here  - 

A  Break  or  At  point  is  already  set  at  this  p-addr. 

Can't  open  AT  point  file  - 

The  command  language  was  unable  to  open  the  AT  file  for 
saving  the  commands.  This  could  occur  if  the  MTS  scratch 
file  character  is  changed  from  a  minus  sign. 


a 

■ 
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AT  COMMAND 


fc~ 


Y»V«V.Y«  »V 


BREAK 


Command  Description 

a.  ** 

Purpose  :  To  control  program  flow  by  setting  break  points. 

Prototype  :  BREAK  p-addr  [skip-cnt] 

Description: 

A  break  point  is  set  at  the  indicated  parcel  address.  The  parcel 
address  may  be  specified  in  a  modified  octal  format:  an  octal  word  add¬ 
ress  followed  by  a  parcel  code  A,B,C  or  D,  or  may  be  a  symbol  with  par¬ 
cel  address  attribute.  An  optional  decimal  skip  count  may  be  specified 
and  will  cause  the  break  point  to  be  ignored  the  indicated  number  of  times. 

The  effect  of  hitting  a  break  point  is  equivalent  to  issuing 
the  command  "USE  *MSOURCE*".  Continuation  from  a  break  point  is  accom¬ 
plished  by  entering  a  RUN  command.  An  end-of-file  condition  from  the 
terminal  (by  $ENDFILE  or  control-c)  will  cause  the  command  stack  (see 
section  2.1.1)  to  be  popped.  This  allows  further  commands  to  come 
from  a  prior  USE  command  or  a  subroutine  call  command  string.  If  a 
RUN  command  is  entered  without  any  parameters,  the  remaining  issue 
limit  is  used  and  simulation  continues  with  the  broken  instruction. 

No  timing  information  is  lost  and  no  additional  time  is  required.  If 
a  p-addr  is  specified  on  the  RUN  command,  an  instruction  buffer  fetch 
is  forced  and  this  will  alter  the  timing. 

When  the  break  point  is  hit,  the  broken  instruction  has  not  been 
executed.  Break  points  do  not  modify  the  Cray-1  memory,  so  they  may 
be  set  before  the  program  is  loaded.  A  maximum  of  forty  BREAK  and  AT 
points  may  be  set.  Break  points  on  the  lower  parcel  of  a  two  parcel 
instruction  are  ignored.  The  user  is  not  notified  of  this. 

Examples: 

BREAK  25A 
BR  13B  18 

B  LABI 

Error  Responses : 

Invalid  p-addr  - 

The  p-addr  is  unrecognizable  or  out  of  range  of  the  current 
memory  size.  _ 
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Invalid  skip  count  - 

Skip  count  unrecognizable  or  negative. 

No  room  for  more  BREAK  or  AT  points  - 

Only  forty  BREAK  or  AT  points  may  be  set  at  any  one 
time. 

Break  or  At  point  already  set  here  - 

A  BREAK  or  AT  point  is  already  set  at  this  p-addr. 
Undefined  Symbol  - 

The  symbol  specified  is  not  defined  in  the  current  (IDENT) 
module. 

Symbol  does  not  reference  a  parcel  address  - 

The  symbol  has  word  address  or  value  attribute. 


CALCULATE 

Command  description 

Purpose  :  To  calculate  integer  offsets  for  memory  displacements 
Prototype  :  CALCULATE  <symbol><opxsymbol> . . . <op><symbol> 

Description : 

The  integer  expression  is  evaluated  strictly  left  to  right.  Only 
four  operators  (<op>)  are  allowed:  The  result  is  displayed 

in  decimal,  octal,  and  modified  octal.  The  operands  (<symbol>)  may  be 
replaced  with  any  pre-defined  or  user  defined  symbol  (see  the  DISPLAY 
command)  or  a  constant  as  follows 

nnn  -  for  an  octal  integer  constant 
O' ruin' -  for  an  octal  integer  constant 
D' ruin'  -  for  a  decimal  integer  constant. 

All  operands  are  interpreted  as  integers  with  only  the  lower  32  bits 
of  any  64  bit  (e.g.,  S-registers)  symbol  taking  part  in  the  computa¬ 
tion. 

Examples : 

CALC  o'm'+D'ss?’ 

CALC  Al*D'50'+  B.R1 
CALC  M(32) +S4 

Error  Responses : 

Calc  unable  to  recognize  operator  - 
Bad  operator  seen  in  expression. 

Calc  unable  to  recognize  operand  - 

Bad  operand  or  invalid  pre-defined  symbol  seen  in  expression. 


CALCULATE  command 


63 


i  CHANGE 

Command  Description 

Purpose  :  To  alter  Cray-1  storage  locations  in  the  simulator 
Prototype  :  CHANGE  symbol  new-value 
Description : 

The  CHANGE  command  allows  any  program  accessible  storage  loca¬ 
tion  in  the  Cray-1  simulator  to  be  changed.  The  symbol  parameter  may 
be  replaced  with  any  of  the  predefined  symbols  which  may  be  changed. 
See  the  DISPLAY  command  description  for  a  list  of  the  valid  symbols. 

The  new-value  parameter  may  be  replaced  with  any  pre-defined 
symbol  or  one  of  the  following  constants: 

nnn  -  for  an  octal  integer 
O' min'-  for  an  octal  integer  constant 
D'nnn'-  for  a  decimal  integer  constant 
nnn.nnnDnn  -  for  a  double  precision  floating  point  constant. 


Examples : 

CHANGE  Al  O' 377’ 

CH  M(55)  2.5D0 

CH  V3 (14)  2.32D27 

CH  M(5J0)  M(100) 

CH  B.BREG  T.TREG 

CH  M(l)  D' -1234567890123' 

Error  Responses: 

Change  unable  to  modify  symbol  - 

See  the  DISPLAY  command  for  a  list  of  the  symbols  that 
may  not  be  changed. 

Change  unable  to  evaluate  symbol  - 

The  symbol  is  not  a  legitimate  symbol 
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CHANGE  command 


CLEAR 

Command  description 

Purpose  :  To  clear  break  points  and  at  points 
Prototype  :  CLEAR  [p-addr  ...] 

Description: 

The  CLEAR  command  is  used  to  remove  any  break  points  or  at 
points  that  have  been  previously  set.  If  no  parcel  address  para¬ 
meters  are  specified,  then  all  break  and  at  points  will  be  cleared. 
If  one  or  more  parcel  addresses  are  supplied  as  arguments  to  the 
CLEAR  command,  then  the  break  or  at  points  set  at  these  locations 
will  be  cleared. 

Examples : 

CLEAR 

CL  21A  35C  74D  LOOPl 

Error  Responses: 

Invalid  p-addr  - 

The  p-addr  is  invalid  or  out  of  range. 

No  break  or  at  points  are  set  - 

Nothing  set  at  this  address  - 
Undefined  Symbol  - 

The  symbol  specified  is  not  defined  in  the  current  (IDENT) 
module 

Symbol  does  not  reference  a  parcel  address  - 

The  symbol  has  word  address  or  value  attribute. 


CLEAR  command 


COMMENT 

Command  description 


Purpose  :  To  provide  documentation  about  a  simulator  session 
Prototype  :  COMMENT  any  text  string  •£ 

Description:  _ 

The  COMMENT  command  is  useful  for  documenting  a  terminal  session 
or  for  generating  advisory  notices  from  AT  command  files  or  subroutine 

call-files.  With  AT  command  files,  the  commands  are  not  echoed  to  th  l' 

<0 

output  device,  however,  COMMENT  commands  will  echo.  Also,  on  a  sub¬ 
routine  call  to  the  simulator  COMMENT  commands  in  the  subroutine  com- 
mand  string  will  echo  regardless  of  the  value  of  the  echo  parameter.  * 
Both  of  these  features  are  useful  to  remind  the  user  of  any  critical 
information. 


Examples : 

COMMENT  ANY  TEXT  STRING  MAY  BE  SUPPLIED. 

CO  THIS  AT  POINT  ALTERS  S3. 

CO  DON'T  FORGET  TO  SET  UP  LOCATION  34 

Error  Responses: 

None 


COMMENT  command 


■  '  •  »\*^  s'*  ,*»  ,**/• 
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COST 

Command  description 


Purpose  :  To  print  out  processing  costs. 

Prototype  :  COST 
Description : 

The  COST  command  will  display  simulator  processing  cost 
information.  The  cost  figures  cover  the  period  from  program 
start-up  or  the  last  INIT  command  to  the  present. 

The  following  information  is  displayed: 


SIMULATION  COS^S  STMCF  LAST  INIT  -  trtslv 


PTC 

*  INS7R.  ISSUED 
HOST  CP IJ  TIME 
HCST  C"'ST 
PRINT  I NC  COST’S 
INSTRUCTION 
I  NSTPUCTION  COST 
MOST  /  CFAY-L  TTMF 


0 

3^6 

0.3*?  SBC 

0.QT4.  LOW  PRIOR  IT  Y 

0.0  FOR  0  LINES  0  PAOFS 

oa?.5  instr./sec 

o.?.?a  «  /  1000  IMSTR. 

0.0 


Examples : 
COST 


The  number  of  simulation  clock  periods. 

The  number  of  instructions  issued. 

The  host  (machine  on  which  simulator  is  running) 
cpu  time. 

The  host  dollar  cost  and  job  priority. 

The  printing  costs  (useful  for  CPACT  output) . 

The  simulation  rate  in  issued  instructions  per  host 
cpu  second. 

The  simulation  cost  in  dollars  per  thousand  issued 
instructions . 

The  ratio  of  host  cpu  time  to  the  Cray-1  cpu  time 
using  the  number  of  simulation  clock  periods  multi¬ 
plied  by  12.5  ns. 


Error  Responses: 
None. 
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COST  command 


CP  ACT 

Command  description 

Purpose  :  To  control  the  generation  of  the  clock  period 
activity  report. 

Prototype  :  CPACT  [fdname  [COMPRESS  |  NOCOMPRESS]  [WIDE  |  NARROW] ] 
Description: 

The  CPACT  command  enables  and  disables  the  clock  period  activ¬ 
ity  report.  If  the  fdname  (MTS  file  or  device  name)  is  supplied, 
CPACT  is  enabled  and  the  report  is  directed  to  the  specified  fdname. 
If  no  fdname  is  supplied  CPACT  is  disabled.  It  should  be  noted  that 
enabling  CPACT  will  increase  the  simulation  cost  over  the  non-timing 
simulation  mode  by  roughly  a  factor  of  forty  to  fifty.  If  timing 
is  off  (see  the  SET  command)  when  CPACT  is  enabled,  CPACT  will  turn 
TIMING  on. 

Since  one  line  of  output  is  generated  for  each  clock  period  of 
simulation  time,  quite  a  bit  of  output  can  be  generated  fairly  fast. 
To  keep  cost  to  a  minimum,  under  MTS,  CPACT  should  be  sent  directly 
to  *PRINT*.  If  COMPRESS  is  specified,  identical  hold  issue  lines 
will  be  suppressed.  COMPRESS  is  the  default. 

The  normal  CPACT  report  is  suitable  for  printing  on  132  column 
printers.  If  NARROW  is  specified  or  fdname  corresponds  to  the  user's 
terminal,  the  report  will  be  condensed  to  80  columns. 

The  report  produced  by  CPACT  is  described  in  more  detail  in  sec¬ 
tion  2.5. 

Examples : 

CPACT  *PRINT* 

CP 

CP  *SINK*  NARROW 
Error  Responses: 

CPACT  already  enabled  - 

CPACT  with  an  fdname  was  given  when  CPACT  was  already 
enabled. 
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CPACT  command 


CPU 

Command  Description 

Purpose:  To  specify  the  indicated  CPU  as  the  current  CPU 

Prototype:  CPU  cpu  number 

Description : 

The  CPU  command  is  used  to  change  the  current  cpu  to  the 
specified  cpu.  The  current  cpu  is  the  cpu  to  which  all  commands 
issued  apply  (specifically  TRACE,  CPACT,  DISPLAY  and  CHANGE). 

The  CPU  command  can  be  thought  of  as  a  global  scope  specifying 
command.  Instead  of  specifying  to  which  CPU  each  command  per¬ 
tains,  a  global  CPU  number  is  specified,  and  all  subsequent 
applicable  commands  pertain  to  the  specified  CPU. 

To  enable  instruction  tracing  on  CPU  3,  the  commands  "CPU 
3"  and  "TRACE  ON"  are  given.  To  change  the  program  counter  of 
CPU  1  (i.e.,  prior  to  a  RUN  command)  the  commands  "CPU  1"  and 
"CH  P  MAIN"  are  given. 

Examples : 

CPU  2 
CPU  4 

Error  responses: 

Invalid  CPU  number  - 

Issued  when  an  incorrect  value  is  specified  for 
the  CPU  number. 
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CPU  command 


DEFINE 

Command  Description 
Purpose  :  To  define  a  new  symbol 
Prototype  :  DEFINE  symbol  constant [w| P | V] 

Description: 

The  DEFINE  command  adds  a  new  symbol  to  the  symbol  table. 

It  may  then  be  used  by  other  simulator  commands. 

The  value  of  the  symbol  will  be  the  constant.  If  the  con¬ 
stant  is  octal,  the  type  of  the  symbol  will  be  word  address. 

If  the  constant  is  modified  octal,  the  type  will  be  parcel 
address. 

The  default  type  may  be  overridden  by  specifying  w,  p,  or 
V.  If  this  is  done,  the  symbol  will  be  defined  as  type 
word  address,  parcel  address,  or  value,  respectively. 

The  symbol  is  added  to  the  symbol  table  of  the  current  IDENT; 
if  there  is  no  governing  IDENT,  then  an  error  will  result.  The  user 
must  issue  at  least  one  IDENT  command  before  using  the  DEFINE  command. 

Examples : 


DEFINE 

START 

22B 

DEF 

ARAY1 

100 

DEF 

BNAME 

77  y 
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DEFINE  command 


v 


DISABLE 

Command  description 

Purpose:  To  deactivate  a  cpu. 

Prototype:  DISABLE  cpu  list 

Description: 

The  DISABLE  command  is  used  to  "turn  off"  a  cpu.  Any  cpu 
except  the  current  cpu  can  be  specified  by  the  command. 


Examples : 

DISABLE  234 
DISA  3 

Error  responses: 

Cannot  disable  current  cpu  - 

issued  when  trying  to  disable  the  cpu  last 
specified  by  the  CPU  command 
Invalid  cpu  number  - 

Issued  when  an  incorrect  value  is  specified  in 
the  CPU  list. 


DISABLE  command 


DISPLAY 


Command  Description 


£1 


Purpose  :  To  allow  the  user  to  examine  the  registers  and  memory  of 
the  simulated  Cray-1. 

Prototype  :  DISPLAY [@fmt]  symbol [, length]  ... 

Description: 

The  DISPLAY  command  provides  a  facility  through  which  the  user  may 
examine  Cray-1  registers  and  memory.  The  location  to  be  displayed  (sym¬ 
bol)  is  represented  by  any  of  the  predefined  symbols  shown  in  the  table 
below,  or  a  user  defined  symbol.  Subsequent  contiguous  locations  can  be 
displayed  by  providing  a  length  parameter,  separated  from  the  symbol  name 
by  a  comma.  The  length  parameter  may  be  a  symbol  name  (e.g.,  VL)  or  a 
decimal  integer  constant.  Also  noted  in  this  table  is  whether  or  not  the 
CHANGE  command  will  alter  the  symbol. 
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Each  symbol  has  a  default  display  format  associated  with  it.  The 
default  may  be  overridden  for  all  symbols  on  the  command  by  appending 
display  format  codes  (fmt)  to  the  command  name.  The  format  codes  string 
is  prefixed  with  These  format  codes  are  defined  as  follows. 


FORMAT 

Code  Meaning 

E  Floating  pt. 

F  Fixed  pt . 

0  Octal 
P  Parcel 
M  Modified  Octal 
I  Instruction 
S  Symbolic 


DISPLAY 

64' 

24' 

16' 

Operand 

Operand 

Operand 

Floating  pt. 

N.A. 

N.A. 

64'  integer 

24'  integer 

16'  integer 

64'  octal 

24'  octal 

16'  octal 

4  octal  parcels 

N.A. 

16'  octal 

4  M.  octal  parcels 

24'  M.  octal 

16'  M.  octal 

4  Instr.  Mnemonics 

N.A. 

Instr .  Mnemonic 

Symbol 

Symbol 

Symbol 

User  defined  symbols  are  those  symbols  defined  by  the  assembly  lan¬ 
guage  program  and  contained  in  a  relocatable  load  module.  These  symbols 
may  be  one  of  three  types:  parcel  address,  word  address  or  value.  A 
parcel  address  symbol  is  treated  as  a  16  bit  operand,  and  names  a  parcel 
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DISPLAY  command 


memory  location.  A  word  address  symbol  is  treated  as  a  64  bit  operand, 
and  names  a  word  memory  location.  A  value  symbol  may  be  used  to  name 
an  A,B,S,T  or  V  register. 

The  only  user  defined  symbols  which  may  be  referenced  are  those 
in  the  current  module.  See  the  IDENT  command. 

For  the  operand  -  format  code  combinations  which  are  not  applicable, 
no  value  will  be  displayed. 

Pre-Defined  Symbol  Table 


Symbol 

name 

Cray-1 

storage 

location 

Region 

length 

allowed? 

Change 

allowei 

Vn(elt) 

Vector  registers 

Yes 

Yes 

Sn 

Scalar  registers 

Yes 

Yes 

Tnn 

T-registers 

Yes 

Yes 

An 

Address  registers 

Yes 

Yes 

Bnn 

B-registers 

Yes 

Yes 

M(addr) 

Memory,  words 

Yes 

Yes 

IM(p-addr) 

Memory,  parcels 

Yes 

Yes 

P 

P-register 

No 

Yes 

CIP 

Current  instruction  parcel 

No 

No 

NIP 

Next  instruction  parcel 

No 

No 

LIP 

Lower  instruction  parcel 

No 

No 

VM 

Vector  mask  register 

No 

Yes 

VL 

Vector  length  register 

No 

Yes 

RTC 

Real  time  clock 

No 

Yes 

XP 

Exchange  package 

No 

No 

IBn(elt) 

Instruction  buffers 

Yes 

No 

RF 

Relocation  factor 

No 

Yes 

FLAGS 

FLAGS  register 

No 

No 

LA 

Limit  address  register 

No 

No 

BA 

Base  address  register 

No 

No 

MSIZ 

Value  of  memory  size 

NO 

No 

MODE 

Mode  register 

No 

Yes 
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n  or  nn  is  a  register  number 
elt  is  an  element  index  within  a  register 
addr  is  a  word  address ,  may  be  an  expression 
p-addr  is  a  parcel  address,  may  be  an  expression 

Examples: 

DISPLAY  A1  S3  A.LOOPCNTR  V.ROWl  SUBRTNl 
D@0  A  0,8 

D@PI  IM(p) ,10  MAIN,  LEN$ 

D@EFO  V0(0),VL 


Error  responses: 

Invalid  format  code  - 

see  format  code  list  on  page  64. 
Invalid  symbol 
Invalid  integer 
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DUMP 

Command  description 


Purpose  :  To  display  the  contents  of  all  data  areas  of  memory. 
Prototype  :  DUMP  [module-name] 

Description: 

The  DUMP  command  displays  the  contents  of  memory  addressed  by 
all  symbols  of  type  word  address.  The  memory  locations  are  displayed 
in  floating  and  fixed  formats. 

If  a  module-name  is  specified,  only  the  module  with  corresponding 
IDENT  name  is  dumped. 


Examples : 

DUMP 

DU  SUBRI 

Error  Responses: 

Module  not  loaded  - 


The  specified  module-name  is  not  the  name  of  any  loaded  module 


DUMP  command 
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ENABLE 

Command  description 

Purpose:  To  activate  other  cpu's  for  multi-tasking 

Prototype:  ENABLE  cpu  list 

Description: 

The  ENABLE  command  is  used  to  "turn  on"  a  cpu.  The  RUN 
command  applies  to  all  cpu's  activated  by  the  ENABLE  command. 

Any  or  all  of  the  cpu's  can  be  specified  by  the  activate  command. 
Up  to  four  cpu's  (1,  2,  3,  4}  can  be  enabled  in  the  current  ver¬ 
sion  of  the  simulator;  this  can  readily  be  changed  in  the  source 
code. 

Examples: 

ENABLE  234 
ENA  3  4 

Error  response : 

cpu  already  enabled  - 

Issued  when  a  cpu  specified  in  the  list  is  already 
activated 

Invalid  cpu  number  - 

Issued  when  an  invalid  number  is  specified  in  the 
cpu  list 
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I 

ENDFILE 

Command  description 

Purpose  :  To  signal  an  end-of-file  condition  to  a  USE  command 
Prototype  :  ENDFILE 
Description : 

This  command  terminates  the  effect  of  the  current  USE  command. 

It  pops  the  command  stack  causing  input  to  be  read  from  the  previous 
source.  See  section  2.1.1  for  more  information  about  command  input 
control.  If  a  USE  command  is  not  in  progress,  ENDFILE  is  a  no-op. 

That  is,  ENDFILE  will  not  terminate  a  call-file  or  an  AT-file. 

Examples : 

ENDFILE 
E 
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ENDFILE  command 


HELP 


Command  description 

Purpose  :  To  provide  on-line  information  about  command  syntax 
and  function. 

Prototype  :  HELP  [cmd-name  ...] 

Description : 

The  HELP  command  takes  as  parameters  the  simulator  command 
names.  For  each  command  name  (cmd-name)  given,  a  bri«f  description 
is  printed.  A  keyboard  attention  may  be  used  to  abort  the  HELP 
output.  If  no  command  name  is  provided  a  list  of  the  legal  commands 
is  printed. 

Examples: 

HELP  DIS  PIAY  CHANGE 
HELP 
H  HELP 

Error  Responses: 

I  can't  help  you  - 

The  file  containing  the  HELP  responses  doesn't  exist  or 
couldn't  be  accessed. 

Error  during  HELP  - 

An  error  occurred  during  I/O  to  the  output  device. 
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HELP  command 


IDENT 

?  Command  description 

Purpose  :  To  determine  the  subset  of  user  defined  symbols 

which  may  be  referenced  by  other  commands . 

Prototype  :  IDENT  module-name 
Description  : 

Relocatable  modules  loaded  by  the  simulator  contain  the  defin¬ 
itions  of  all  symbols  defined  in  the  assembly  language  program.  Since 
assembly  programs  can  be  assembled  and  loaded  independently,  these 
symbols  may  not  be  unique.  Only  those  symbols  defined  within  a  single 
module  may  be  used  at  any  given  time. 

The  name  field  on  the  IDENT  command  must  be  the  name  contained  on 
the  IDENT  record  of  one  of  the  loaded  modules.  Only  the  symbols  con¬ 
tained  in  that  module  will  be  available  for  use  by  other  commands. 

Examples: 

IDENT  MAIN 
ID  SUBROUTN 

Error  Responses : 

Module  not  loaded  - 

The  name  specified  did  not  appear  on  the  IDENT  card  of  any 
loaded  module. 


IDENT  command 


INIT 

Command  Description 

Purpose:  To  re-initialize  the  simulator 

Prototype:  INIT  [STAT] 

Description: 

The  INIT  command  allows  re-initialization  of  the  simulator 
state  between  runs  of  a  program.  It  has  the  following  effects: 

1.  All  timing  information  is  initialized. 

2.  All  report  information  is  initialized. 

3.  The  simulator  state  is  cleared. 

4.  The  CPU  clock  is  cleared. 

5.  The  CIP,  NIP,  LIP,  and  instruction  buffers  are  invali¬ 
dated. 

INIT  will  not  alter  the  A,B,S,T,V,VL,MODE,P,  and  VM  registers  or 
simulator  memory. 

If  the  STAT  parameter, is  specified,  then  only  the  timing  in¬ 
formation  and  the  CPu  clock  are  initialized. 

Examples : 

INIT 
I  STAT 


INIT  command 
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LOAD 

Command  description 

Purpose  :  To  load  programs  into  the  simulated  CRAY-1  memory. 

Prototype  :  LOAD  [ s . a . ]  f dname  . . . 

Description: 

The  LOAD  command  opens  the  file  or  device  (f dname)  and  reads 
one  or  more  load  modules  from  it.  The  load  modules  may  be  absolute 
or  relocatable.  See  appendix  K  for  a  discussion  of  load  module 
formats . 

Absolute  load  modules  are  loaded  at  the  address  specified  in 
the  module.  The  octal  starting  address  (s.a.),  if  specified,  pre¬ 
ceding  the  f dname,  is  ignored. 

Relocatable  modules  are  loaded  at  the  first  available  16  word 
boundary,  unless  an  octal  starting  word  address  (s.a.)  is  specified. 
Modules  will  be  relocated  and  linked  by  the  loader. 

Examples : 

LOAD  OBJ 
LOAD  30  FILEl 
LOAD  *SOURCE* 

Error  responses 

Unresolved  externals  exist  - 

A  relocatable  module  references  a  module  which  is  not  loaded. 
The  user  will  be  prompted  for  more  loader  input. 
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Command  Description 


Purpose  :  To  display  the  locations  of  all  loaded  modules 

Prototype  :  MAP 
Description  : 

The  MAP  command  will  display  the  names,  starting  locations, 
lengths  of  all  loaded  modules. 

Examples : 

MAP 

MA 
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MTS 

Command  description 

Purpose  :  To  provide  a  command  interface  between  the 
simulator  and  MTS. 

Prototype  :  1.  MTS  [mts -command] 

2.  $mts-command 

Description : 

The  MTS  command  allows  the  user  to  pass  commands  to  MTS 
without  stopping  the  simulator.  In  the  first  prototype  an 
optional  MTS  command  may  be  supplied.  Return  is  made  to  MTS 
with  MTS  processing  the  command.  The  user  may  restart  the 
simulator  with  the  $RESTART  MTS  command.  The  second  prototype 
allows  the  issuing  of  a  one-shot  MTS  command.  That  is,  the 
command  is  passed  to  MTS  but  control  returns  to  the  simulator 
automatically  when  the  command  finishes.  Any  command  input 
to  the  simulator  that  is  prefixed  with  a  dollar  sign  is  treated 
as  one-shot  MTS  command. 

Examples : 

MTS 

M  DIS  VMSIZE 
$ EMPTY  -RPT 
$EDIT  TRIDEC 
$SDS 

Error  Responses: 

None. 


MTS  command 
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REMOVE 

Command  Description 

Purpose:  To  remove  a  symbol  from  the  simulator's  symbol  tables 

Prototype:  REMOVE  symbol 

Description: 

The  specified  symbol  is  removed  from  the  symbol  table  of 
the  current  IDENT  (set  via  the  IDELIT  command)  .  If  an 
IDENT  command  has  not  been  previously  given,  the  issue  of  a  REMOVE 
command  will  cause  an  error. 

Examples: 

REMOVE  START 
REM  ARRAY 1 

Error  Response: 

symbol  is  not  defined. 
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RETURN 

Command  description 

Purpose  :  To  allow  the  simulator  to  return  to  its  caller. 

Prototype  :  RETURN 

Description: 

The  RETURN  command  is  used  to  force  the  simulator  to  return 
to  its  caller.  Normally,  when  the  simulator  is  called  as  a  sub¬ 
routine,  the  CRAY-1  interface  subroutine  will  automatically  place 
a  RETURN  command  at  the  end  of  the  call-file  after  all  other  user 
commands.  As  the  call-file  is  processed  this  RETURN  will  eventually 
be  executed.  To  cause  an  early  return  to  the  caller,  the  user  may 
issue  a  RETURN  command,  thereby  skipping  the  remaining  commands  in 
the  call-file. 

A  RETURN  command  issued  when  running  the  simulator  stand-alone 
is  equivalent  to  a  STOP  command. 

Examples: 

RETURN 
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RUN 

Command  Description 

Purpose:  To  begin  simulation  of  a  Cray-M  assembly  program 

Prototype:  RUN  [tissue  limit] 

Description : 

The  RUN  command  is  the  only  simulator  command  that  actually 
begins  the  simulation.  All  active  CPU's  begin  execution  at  their 
current  program  counter  locations. 

Before  the  initial  RUN  command  is  given,  the  program  counters 
of  all  the  active  cpu's  must  be  given  initial  values.  The  start¬ 
ing  location  must  be  specified  by  using  the  CHANGE  command: 

CPU  cpu  number 

CHANGE  P  < Start  address > 


I 


This  alters  the  program  counter  for  the  specified  cpu.  An  error 
message  is  issued  if  an  active  CPU  has  an  uninitialized  program 
counter  when  a  RUN  command  is  given. 

The  optional  issue  limit  parameter  can  be  used  to  control  the 
simulation.  This  must  be  a  positive  decimal  number  prefixed  by  a 
pound  sign  (#)  and  is  used  to  prevent  run-away  programs  or  to  allow 
single  stepping  through  a  program.  If  an  issue  limit  is  not  pro¬ 
vided,  the  remainder  of  a  previous  issue  limit  is  used  or  if  no 
remainder  is  left,  a  default  value  of  1000  is  supplied. 

There  are  many  conditions  that  can  arise  to  stop  the  simulation. 
Normally,  a  run  command  will  terminate  when  an  EX  instruction  (004000) 
is  executed  and  this  is  the  usual  procedure  to  stop  a  program.  Other 
common  conditions  that  stop  simulation  are  breakpoints,  at-points,  or 
issue  limit  expired.  The  exceptional  conditions  that  halt  simulation 
are  discussed  in  Section  2.2. 
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Examples : 

Run  #5000 

RUN 

R 

R  #1 
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RUN  command 


The  last  example  illustrates  how  the  user  would  single  step  through 
the  program,  executing  one  instruction  at  a  time. 

Error  response: 

CPU  program  counter  not  set  - 

Issued  when  a  CPU's  programcounter  has  not  been 
initialized. 

Invalid  issue  limit  - 

Issued  when  an  incorrect  issue  limit  is  specified. 
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Command  Description 


si 

Purpose  :  To  permit  alteration  of  user  setafale  switches 
Prototype  :  SET  lhs=rhs  [■'. 

Description: 

The  SET  command  allows  the  user  to  control  some  of  the  feat-  3 
ures  of  the  simulator.  Each  parameter  is  composed  of  a  left  hand 
side  (lhs) ,  an  equal  sign  and  a  right  hand  side  (rhs) .  The  left 
hand  sides  are  the  keyword  names  and  the  right  hand  sides  are 
the  new  keyword  values.  The  table  below  lists  the  legitimate 
left  hand  sides  followed  by  a  discussion  of  each  one. 


Keyword 

Keyword  values 

Default 

EFI 

ON,  OFF 

ON 

kfc 

IS LIMIT 

positive  integer 

1000 

■*» 

,  n 

MACHINE 

CRAYl,  CRAY 1 -A 

CRAYl 

.  -« 

MEMORY 

SECDED ,  PARITY 

SECDED 

• 

OUTPUT 

any  fdname 

*MSINK* 

8 

TIMING 

ON,  OFF 

OFF 

TASK 

ON,  OFF 

OFF 

> 

*  . 

--  , 

default:  ON 

■ 

The  EPI  (enable  floating  point  interrupt)  keyword  allows  " 

user  control  of  interrupts  caused  by: 

1)  Exponent  overflow  £ 

2)  Exponent  underflow 

•3)  Floating  point  division  by  zero. 

If  EFI  is  ON,  the  above  three  conditions  will  stop  simulation.  If  >' 
EFI  is  OFF,  these  three  conditions  will  be  ignored  when  they  occur.  .. 
When  the  simulator  starts  up  EFI  is  on  by  default.  EFI  is  a  mode  -S’ 
bit  in  the  Cray-1  mode  register  and  the  Cray-1  instructions  EFI  and 
DFI  can  also  set  or  clear  this  bit.  v 
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! 


ISLIMIT 


Default:  1000 

The  ISLIMIT  keyword  allows  the  user  to  change  the  default  / 

instruction  issue  limit.  If  no  issue  limit  is  specified  on  the  j 

RUN  command  and  no  remaining  issue  limit  exists  from  previous  fc 

run  commands,  the  default  instruction  issue  limit  is  used.  A 
positive  decimal  integer  must  be  specified  on  the  right  hand  \ 

side.  When  the  simulator  starts  up  this  keyword  has  a  default 
value  of  1000.  Setting  ISLIMIT  to  one  is  useful  for  single  stepping  I 
through  the  program.  / 


MACHINE  Default :  CRAYl 

This  keyword  is  intended  for  selecting  the  use  of  experimental 
architectural  modifications  to  the  Cray-1  simulator.  The  current 
legitimate  keyword  values  are  "CRAYl"  and  "CRAYl-A" ,  with  default 
being  "CRAYl".  When  CRAYl  is  selected,  normal  Cray-1  timing  is 
in  force.  Currently,  selecting  CRAYl-A  invokes  only  one  Cray-1 
architectural  modification:  that  of  improved  memory  bandwidth. 

With  CRAYl-A,  the  data  rates  (in  words  per  clock  period)  for  block 
transfers  (instructions  034-037,  176,  177)  to  and  from  main  memory 
are  shown  in  the  table  below.  These  data  rates  are  a  function  of 
the  address  increment  (K)  used  by  the  block  transfer  (one  for  034- 
037,  (Ak)  for  176,  177)  . 
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1 


K  mod  16 

Data  Rate  (wds/cp) 

K  mod  16 

Data  Rate  (wds/cp) 

0 

.25 

8 

.5 

1 

4 

9 

4 

2 

2 

10 

2 

3 

4 

11 

4 

4 

1 

12 

1 

5 

4 

13 

4 

6 

2 

14 

2 

7 

4 

15 

4 

When  CRAYl-A  is  selected,  chaining  a  vector  arithmetic  instruction- 
off  of  a  vector  memory  load  (176)  is  disallowed.  This  is  because 
of  the  possible  imbalance  in  data  rates  between  the  two  instructions. 
In  general,  this  should  not  be  a  hardship  since  a  reordering  of 
vector  instructions  usually  allows  one  to  stagger  the  vector  memory 
references  to  run  in  parallel  with  the  arithmetic  vector  instruc¬ 
tions. 
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MEMORY  Default:  SECDED 

The  first  Cray-1  built  by  Cray  Research  Inc.  has  a  memory 
which  is  protected  by  parity  checking  only.  This  was  later  found 
to  be  unsatisfactory  and  subsequent  machines  were  built  with  SEC¬ 
DED  (single  error  correction  -  double  error  detection)  memory 
protection.  By  introducing  SEC-DED  on  the  memory,  the  access 
path  to  memory  is  one  clock  period  longer  than  on  rhe  parity  checked 
memory.  This  timing  difference  is  user  selectable  _r.  the  simulator. 

By  setting  MEMORY  to  the  value  PARITY  (e.g.,  SET  ME\ORY=PARITY) , 
timing  with  the  parity  checked  memory  is  possible.  When  the  simu¬ 
lator  starts  up  the  default  memory  timing  is  SECDED. 

OUTPUT  Default:  *MSINK* 

The  OUTPUT  keyword  controls  the  file  or  device  to  which  the 
simulator  sends  all  normal  output  (i.e.,  not  prompts  or  error  mess¬ 
ages,  which  always  go  to  the  terminal) .  Normal  output  includes  in¬ 
formational  messages,  DISPLAY,  HELP,  and  STAT  output.  When  the 
simulator  starts  up  OUTPUT  is  set  to  *MSINK*  (the  terminal) .  With 
the  SET  command  OUTPUT  may  be  set  to  another  file  or  device.  A  key¬ 
board  attention  will  switch  the  output  back  to  *MSINK*  automatically. 

TIMING  Default:  OFF 

The  TIMING  keyword  controls  simulator  resource  timing.  If 
TIMING  is  off,  only  the  results  of  instruction  execution  are  com¬ 
puted.  If  TIMING  is  on,  resource  timing,  reservation  and  issue  con¬ 
straints  are  simulated.  By  default,  TIMING  is  off  when  the  simulator 
starts  up.  Setting  TIMING  on  increases  the  simulation  cost  by  a 
factor  of  eight  to  ten.  TIMING  may  also  be  enabled  and  disabled 
by  the  ERT  and  DRT  instructions  respectively.  See  section  5  for 
more  explanation  on  ERT  and  DRT.  Timing  must  be  enabled  to  produce 
the  CPACT  report.  However,  the  enabled  or  disabled  state  of  CPACT 
is  independent  of  the  setting  of  TIMING.  That  is,  turning  TIMING 
on  and  off  won't  affect  the  enabled  state  of  CPACT.  However,  the 
CPACT  report  is  not  producted  while  TIMING  is  disabled,  but  it  will 
be  resumed  when  TIMING  is  turned  back  on. 
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The  TASK  keyword  controls  gathering  of  task  statistics. 
Setting  TASK  equal  to  ON  also  enables  resource  timing  (TIM=ON) , 
which  increases  simulation  cost  by  a  factor  of  8  to  10.  The 
default  state  of  TASK  is  OFF.  When  TASK  is  turned  on,  the 
simulator  prompts  for  the  name  of  a  file  containing  task  de¬ 
finitions.  The  contents  of  the  file  control  the  format  of  the 
TACT  Report  as  well  as  the  definition  of  tasks  is  simulator 
memory. 

A  task  is  a  section  of  code  that  has  a  unique  entry  and 
exit  point.  When  a  cpu  enters  the  task,  the  task's  name  is 
entered  that  cpu's  column  in  the  TACT  Report.  When  the  cpu 
passes  through  the  exit  point,  the  task  name  is  removed  from  the 
cpu's  column.  Upon  entry  and  exit  from  a  task,  timing  information 
is  recorded  for  later  use  in  the  TACT  STAT  report.  For  a  detailed 
description  of  the  Task  Definition  file,  see  Appendix  K. 
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STAT 

Command  description 
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Purpose  :  To  print  out  Cray-1  resource  usage  statistics 
Prototype  :  STAT  [FULL] 

Description: 

This  command  will  print  on  the  current  output  device  a  sum¬ 
mary  report  of  Cray-1  resource  usage.  This  report  is  composed 
of  the  following  three  sections: 

1.  Vector  Usage  counts 

2.  Floating  point  result  counts 

3.  Data  traffic  counts 

The  vector  usage  counts  section  is  a  timing  measure  of  the 
program's  vector  use  of  the  Cray-1  vector  functional  units.  The 
data  for  this  section  is  only  collected  when  TIMING  is  ON .  If 
TIMING  is  OFF  when  the  STAT  command  is  issued,  this  section  will 
not  be  printed  since,  most  likely,  it  would  all  be  zero. 

The  floating  point  result  counts  section  is  a  measure  of  the 
program's  use  of  floating  point  computation.  Floating  point  add¬ 
ition,  multiplication  and  reciprocal  operations  are  tabulated  for 
both  vector  and  scalar  instructions.  The  data  for  this  section 
is  alwavs  collected  regardless  of  the  state  of  TIMING. 

The  data  traffic  counts  section  is  a  measure  of  the  data 
(operands  &  results)  flow  throughout  the  Cray-1.  Each  major  Cray-1 
data  path  is  illustrated  on  a  figure,  that  is  part  of  the  report, 
along  with  the  amount  of  traffic  on  each  path.  Also  included  in 
this  section  are  some  calculations  of  ratios  and  percentages 
based  on  the  data  traffic  statistics.  The  formulas  used  for  each 
calculation  are  printed  beyond  column  80  of  the  line  containing 
the  calculated  number.  Normally,  these  formulas  won't  appear  on 
an  80  column  terminal,  but  will  be  printed  if  the  STAT  output  is 
diverted  (via  SET  OUTPUT«*PRINT*)  to  the  line  printer. 

The  data  for  this  section  is  always  collected  regardless  of 
the  state  of  TIMING.  This  section  will  not  be  printed  unless  the 
FULL  option  is  specified  on  the  stat  command.  The  INIT  command 
will  reinitialize  the  STAT  data  collection. 
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STAT  command 
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This  discussion  is  intended  as  a  brief  command  description. 
For  a  detailed  discussion  of  both  the  STAT  and  CPACT  reports 
see  section  2.5. 

Examples: 

STAT 

STAT  FULL 

SET  OUTPUT- *PRINT* 

STAT 

SET  OUTPUT- *MS INK  * 

Error  responses: 

Extraneous  parameter  on  STAT  command  - 

This  occurs  if  FULL  is  misspelled  or  improperly  abbrev 
iated . 
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STAT  command 


Command  description 


Purpose  :  To  terminate  execution  of  the  Cray-1  simulator. 

Prototype  :  STOP 

Description: 

The  STOP  command  terminates  execution  of  the  simulator,  re¬ 
leases  virtual  memory  used  by  the  simulator  and  returns  to  MTS. 

Examples : 

STOP 

Error  Responses: 

None. 


STOP  command 


TRACE 

Command  Description 

Purpose  :  To  control  the  generation  of  the  trace  output  report, 
a  report  of  data  transfers  for  each  instruction. 
Prototype  :  TRACE  ON | OFF  [fdname]  [LEN  =  VL| trace  length] 
Description: 

The  TRACE  command  enables  and  disables  the  trace  output  report. 
The  trace  output  report  consists  of  the  instruction  parcel  address, 
instruction  mneumonic,  and  the  contents  of  relevent  registers.  If 
fdname  is  not  specified  the  output  is  sent  to  the  fd  specified  by 
the  SET  OUTPUT  ■  fd  command  (the  default  is  *MSINK*) . 

When  "LEN  *  VL"  is  specified  all  results  produced  by  vector 
operations  will  be  displayed.  In  the  case  of  B  and  T  block  transfers 
all  elements  transferred  will  be  displayed.  If  "LEN  *  n" 

(0  <  n  £  6*10 ^  is  specified  n  elements  are  displayed  on  vector  opera¬ 
tions.  The  minimum  of  n  and  the  block  transfer  length  will  be  dis¬ 
played  for  B  and  T  block  transfers.  (The  default  value  is  LEN  =  VL) . 

The  trace  length  may  also  be  set  using  the  SET  command:  "SET 
LEN  ■  trace  length" •  The  following  page  shows  a  trace  output  example 

EXAMPLES: 

TRACE  ON 

T  ON  -A  L*10 

T  ON  L»VL 

T  ON  LEN-20 

T  OFF 

ERROR  MESSAGES: 

ERROR  -  INVALID  RIGHT  HAND  SIDE:  rhs  (e.g.  LEN  «  -  1) 

ERROR  -  INVALID  LEFT  HAND  SIDE:  lhs  (e.g.  LENT  -  10) 

***  INVALID  TRACE  COMMAND  PARAMETERS 

***  INVALID  TRACE  COMMAND  FDNAME 


TRACE  command 
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USE 

Command  description 

Purpose  :  To  switch  the  command  input  stream  to  a  file. 

Prototype  :  USE  fdname  l NOECHO] 

Description: 

This  command  allows  the  user  to  put  a  long  or  frequently  used 
command  sequence  in  a  file  and  have  the  simulator  process  those 
commands  from  that  file.  The  fdname  parameter  is  replaced  with  the 
name  of  an  MTS  file  or  device  from  which  the  simulator  will  read 
subsequent  c  imands.  Commands  read  from  the  file  will  automatically 
echo  onto  the  current  output  device  unless  the  optional  NOECHO 
parameter  is  specified. 

Any  end-of-file  condition  or  an  ENDFILE  command  will  terminate 
the  USE  command.  This  will  pop  the  command  stack  causing  input  to 
resume  with  the  previous  source.  The  command  stack  is  fifteen  levels 
deep,  allowing  the  user  to  nest  USE  commands  as  desired. 

A  keyboard  attention  may  be  used  to  abort  any  and  all  outstanding 
USE  commands  by  resetting  the  command  stack.  This  will  cause  the 
terminal  to  be  current  input  device. 

Examples : 

USE  DISPFILE 

USE  *MSOURCE*  -  to  read  from  the  terminal 
U  CMOS  NOECHO 


a 


y. 


?! 


Y 

V, 

C 


o 


i 


Error  Responses: 

USE  command  unable  to  open  file  - 

The  given  fdname  doesn't  exist  or  access  not  allowed. 
FDUB  command  stack  Overflow- 

Attempt  to  nest  more  than  15  USE  commands. 


* 


USE  command 


i 
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4.  Cray-M  Simulation  Costs 

Because  instruction-level  simulation  is  admittedly  costly , 
it  is  important  to  utilize  only  the  level  of  simulation  (numerical 
versus  timing)  and  the  reporting  level  appropriate  to  the  need. 
Fortunately,  the  interactive  nature  of  the  simulator  makes  it 
possible  to  switch  these  levels  on  and  off  during  a  run;  the  most 
costly  levels  are  rarely  required  for  an  entire  simulation. 

Table  1  gives  the  costs  of  running  1000-5000  instructions  with 
a  variety  of  simulation  and  reporting  levels.  (Note  that  semaphores 
and  shared  registers  rotate  without  timing  on  (see  Appendix  E)). 
Among  the  figures  given,  the  most  significant  appears  to  be 

(a)  a  8970:1  speedown  of  uniprocessor  CRAY-1  time  to  Amdahl 
time;  the  simulation  costs  increase  approximately  linearly 
with  the  number  of  processors. 

(b)  a  3.3:1  ratio  of  costs  between  simulation  with  timing  on 
and  timing  off,  per  processor;  this  ratio  has  been  found 
to  be  as  high  as  5:1  for  highly  vectorized  code. 

As  a  benchmark  case,  a  million  clock,  4-processor  run  costs 
approximately  $100  at  minimum  rates  (4-7  am)  and  $500  at  regular 


rates 


Number 

of  Processors 

Units 

1 

2 

4 

TIM-OFF 

1.52 

1.32 

1.30 

<f:/kiloinstruction* 

TIM-ON 

2.07 

3.88 

9.23 

<{:/kiloclock 

TRACE  ON+A 

8970 

17200 

37500 

Amdahl  time/CRAY-1  time 

88 

63 

46 

<{:/kiloinstruction* 

CPACx 

72 

130 

250 

<{:/kiloclock 

*  Instructions  summed  across  all  processors 
+  TIM-OFF 

A  Printing  costs  not  included 

Table  1.  Simulation  costs;  minimum  rates  used 
(20%  normal  rates),  approximately  19C/sec;  ap¬ 
proximately  2.4  clocks/ (instruction  issue)  (33 
MIPS)  per  processor  in  code  used. 


1)  A  CRAY-1  Simulator,  publication  No.  118,  Systems  Engineering 
Laboratory,  University  of  Michigan  by  D.  A.  Orbits,  1978. 

2)  Cray-1  Hardware  Reference,  publication  No.  2240004, 
by  Cray  Research  Inc.,  1980. 

3)  Cray-1  Fortran  Reference  Manual,  publication  no. 

2240009,  by  Cray  Research  Inc.,  1980. 

4)  Cray-1  External  Reference  Specification,  publication 
no.  2240011,  by  Cray  Research  Inc.,  1976. 

5)  Cray-1  Fortran  Mathematical  Subprogram  Library  Reference 
Manual,  publication  no.  2240014,  by  Cray  Research  Inc.,  1977 

6)  Cray-1  Reference  Card,  publication  no.  SQ-0003,  by  Cray 
Research  Inc.,  1981. 

7)  Cray-1  CAL  Assembler  Reference  Manual,  publication  no. 
SR-0000,  1980. 

8)  Introduction  to  Vector  Processing  on  the  Cray-1  Computer, 
publication  no.  2240002,  1975. 

9)  The  Cray-1  Computer  System,  CACM,  by  Richard  M.  Russell, 

Cray  Research  Inc.,  January  1978,  pp.  63-72. 


Appendix  A. 

Summary  of  Cray-1  Timing  Information 

This  material  has  been  borrowed  from  the  Cray-1  Reference 
Manual,  publication  number  2240004,  by  Cray  Research,  Inc. 


Mien  issue  conditions  are  satisfied  an  instruction  completes  In  a  fixed 
amount  of  time.  Instruction  Issue  may  cause  reservations  to  be  placed 
on  a  functional  unit  or  registers.  Knowledge  of  the  Issue  conditions. 
Instruction  execution  times  and  reservations  permit  accurate  timing  of 
code  sequences.  Memory  bank  conflicts  due  to  I/O  activity  are  the  only 
element  of  unpredictability. 

SCALAR  INSTRUCTIONS 

Four  Conditions  must  be  satisfied  for  Issue  of  a  scalar  Instruction: 

l*  The  functional  unit  must  be  free.  No  conflicts  can  arise  with  other 
scalar  Instructions,  however  vector  floating  point  instructions 
reserve  the  floating  point  units.  Memory  references  may  be  delayed 
due  to  conflicts. 

2.  The  result  register  must  be  free. 

3.  The  operand  register  must  be  free. 

4.  Issue  Is  delayed  1  clock  period  If  a  result  register  group  Input  path 
conflict  would  exist  with  a  previously  Issued  Instruction.  One  input 
path  exists  for  each  of  the  four  register  groups  (A,  B,  S  and  T). 

Scalar  Instructions  place  reservations  Only  on  result  registers.  A  result 
register  is  reserved  for  the  execution  time  of  the  Instruction.  No 
reservations  are  placed  on  the  functional  unit  or  operand  registers. 

A  transmit  scalar  mask  Instruction  to  SI  (073)  Instruction  Is  delayed 
hy  (VL)  +  6  clock  periods  from  the  issue  of  a  previous  vector  mask 
(17S)  instruction,  and  is  delayed  by  6  clock  periods  from  >the  issue 
of  a  preceding  transmit  (Sj)  to  VM  (003)  Instruction. 


Execution  times  in  clock  periods  are  given  below.  An  asterisk  indicates 
that  issue  may  be  delayed  because  of  a  functional  unit  reservation  by  a 
vector  instruction.  Memory  may  be  considered  a  functional  unit  for  timing 
considerations. 

(A*A  register,  M*Memory,  B*8  register,  S»S  register,  I=Immediate,  OChannel) 


24-bit  results: 


A-*—  M 

11* 

A-*— C 

4 

M-*— A 

1* 

A-*-A+A 

2 

A-»— B 

1 

A-*— AxA 

6 

8-*— A 

1 

A-*— pop(S) 

4 

A  — S 

1 

A  lzc(S) 

3 

A  I 

1 

VL-«—  A 

1 

64-bit  results: 

S-*-K 

11* 

S-*-S+S 

3 

M  S 

1* 

S-— S(f.aJd)S 

6* 

S—T 

1 

S  S(f  .mult)S 

7* 

T  -*—S 

1 

S-^-S(r.a.) 

14* 

S-*— I 

1 

S— V 

5 

S  S(log.  )S 

1 

V  — S 

3 

S— S(shift)I 

2 

S-*—VM 

1 

S— -S(sh1ft)A 

3 

S  — RTC 

1 

S-*-*S(mask)I 

1 

S-*— A 

2 

RTC-*-S 

1 

VM-*— S 

3 

*  Issue  may  be  delayed  because  of  a  functional  unit  reservation  by  a 
vector  Instruction.  Memory  may  be  considered  a  functional  unit  for 
timing  considerations. 

VECTOR  INSTRUCTIONS 

Four  conditions  must  be  satisfied  for  issue  of  a  vector  instruction: 

1.  The  functional  unit  must  be  free.  (Conflicts  may  occur  with  vector 
operations.) 

2.  The  result  register  must  be  free.  (Conflicts  may  occur  with  vector 
operations.) 

3.  The  operand  registers  must  be  free  or  at  chain  slot  time. 

4.  Memory  must  be  quiet  If  the  instruction  references  memory. 

Vector  instructions  place  reservations  on  functional  units  and  registers 

for  the  duration  of  execution. 

1.  Functional  units  are  reserved  for  Vl+4  clock  periods.  Memory  is 
reserved  for  VL+5  clock  periods  on  a  write  operation,  VL+4  clock 
periods  on  a  read  operation. 
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2.  The  result  register  is  reserved  for  the  functional  unit  time 
+(Vl+2)  clock  periods.  The  result  register  is  reserved  for  the 
functional  unit  +7  clock  periods  if  the  vector  length  is  less  than 
5.  At  functional  unit  time  +2  (chain  slot  time)  a  subsequent 
instruction,  which  has  met  all  other  issue  conditions,  may  issue.  This 
process  is  called  "chaining."  Several  instructions  using  different 
functional  units  may  be  chained  in  this  manner  to  attain  a  significant 
enhancement  of  processing  speed. 

3.  Vector  operand  registers  are  reserved  for  VL  clock  periods.  Vector 
operand  registers  are  reserved  for  5  clock  periods  if  the  vector 
length  is  less  than  5.  The  vector  register  used  in  a  block  store  to 
memory  (177  instruction)  is  reserved  for  VL  clock  periods.  Scalar 
operand  registers  are  not  reserved. 

Vector  instructions  produce  one  result  per  clock  period.  The  functional 
unit  times  are  given  below.  The  vector  read  and  write  instructions 
(176,  177)  produce  results  more  slowly  if  bank  conflicts  arise  due  to 
the  increment  value  (Ak)  being  a  multiple  of  8.  Chaining  cannot  occur 
for  the  vector  read  operation  in  this  case. 

If  (Ak)  is  an  odd  multiple  of  8t  results  are  produced  every  2  clock 
periods. 

If  (Ak)  is  an  even  multiple  of  8t  results  are  produced  every  4  clock 
periods. 


Functional  unit  Time  (c.p.) 

.Logical  2 

Shift  4 

Integer  add  3 

Floating  add  6 

Floating  multiply  7 

Reciprocal  approximation  14 

Memory  7 


Mi'.n 

«  •  ■  - 


R 

r 


%  i 

"  -1 

> 


d  M 


t  Multiple  of  4  for  8-bank  phasing;  refer  to  section  5. 


Memory  must  be  quiet  before  issue  of  the  B  and  T  register  block  copy 
instructions  (034-037).  Subsequent  instructions  may  not  issue  for  14+  (Ai) 
clock  periods  if  (Ai)/0  and  5  clock  periods  if  (Ai)a0  when  reading 
data  to  the  B  and  T  registers  (034,036).  They  may  not  issue  for  6+(Ai) 
clock  periods  when  storing  data  (035,037). 

The  B  and  T  register  block  read  (034,036)  instructions  require  that  there 
be  no  register  reservation  on  the  A  and  S  registers,  respectively,  before 
issue. 

Branch  instructions  cannot  issue  until  an  A0  or  SO  operand  register  has 
been  free  for  one  clock  period.  Fall -through  in  buffer  requires  two 
clock  periods.  Branch-in-buffer  requires  five  clock  periods.  When  an 
"out  of  buffer"  condition  occurs  the  execution  time  for  a  branch 
instruction  is  1.4  clock  periods?" 

A  two  parcel  instruction  takes  two  clock  periods  to  issue. 

Instruction  issue  is  delayed  2  clock  periods  when  the  next  instruction 
parcel  is  in  a  different  instruction  parcel  buffer.  Instruction  issue  is 
delayed  14  clock  periods  if  the  next  instruction  parcel  is  not  in  an 
instruction  parcel  buffer. 

HOLD  MEMORY 

A  delay  of  1,  2,  or  3  CP  will  be  added  to  a  scalar  memory  read  if  a  bank 
conflict  occurs  with  rank  C,  B,  or  A,  respectively,  of  the  memory  access 
network.  A  conflict  occurs  If  the  address  is  in  the  same  bank  is  the 
address  in  rank  C,  B,  or  A.  Conflicts  can  occur  only  with  scalar  or  1/0 
references.  The  scalar  instruction  senses  the  conflict  condition  at 
issue  time  +  1  CP.  The  scalar  Instruction  address  enters  rank  A  of  the 
memory  access  network  at  issue  time  +  1  CP.  The  scalar  instruction 
address  enters  rank  B  at  issue  +  2  CP.  The  scalar  instruction  address 
enters  rank  C  at  Issue  +  3  CP. 

t  18  clock  periods  for  8-bank  phasing  option;  refer  to  section  5. 


Scalar  instruction  timing  (no  conflict): 

CP  n  Issue,  re serve  register 
CP  n+1  Address  rank  A,  sense  conflict 

CP  n+2  Address  rank  B 

CP  n+3  Address  rank  C 

CP  n+9  Clear  register  reservation 

CP  n+10  Issue 

HOLD  ISSUE 

A  delay  of  issue  results  if  a  100  -  137  Instruction  is  in  the  NIP  register 
and  a  hold  memory  condition  exists.  The  delay  will  depend  on  the  hold 
memory  delay. 

A  delay  of  issue  results  if  a  100  -  137  instruction  is  in  the  NIP  register 
and  a  100  -  137  instruction  In  process  senses  a  conflict  with  rank  A,  B, 
or  C. 

An  additional  1  CP  delay  is  added  to  a  hold  memory  condition  if  a  070 
instruction  destination  register  conflict  is  sensed. 


rend ix  B. 


Cray-1  Simulator  I/O  Device  Usage 


I/O  in  the  Cray-1  simulator  is  done  in  two  ways: 


1}  Through  the  use  of  standard  Fortran  data  set 
reference  numbers  (DSRN)  and. 


2)  Through  the  use  of  an  MTS  environment  file  or  device 
usage  block  (FDUB) . 


The  following  I/O  is  done  through  DSRNs : 


-  All  error  messages  use  I/O  unit  0. 

-  All  CP ACT  and  TACT  output  uses  I/O  units  30  through  64. 

-  All  LOAD  module  input  uses  I/O  unit  2. 

-  All  normal  Terminal  output  (echoing,  etc.)  uses  I/O 
unit  3. 

-  All  memory  to  memory  I/O  used  for  number  conversion,  etc., 
uses  I/O  unit  20. 


The  following  I/O  is  done  through  MTS  provided  FDUBs: 


-  All  command  input,  whether  from  a  call-file,  an  AT-file, 

a  USE-file  or  the  terminal  is  read  using  FDUBs.  The  command 
input  stack  is  implemented  with  FDUBs. 

-  All  HELP  file  responses  are  read  from  a  file  using  a  FDUB. 

-  The  simulator  driver  tables  are  loaded  at  start  up  time 
using  a  FDUB. 


The  user  should  not  use  DSRNs  0,  1,  2,  3,  20  and  30  through  64. 


Cray-1  Simulator  Common  Block  Usage 


The  Cray-1  simulator  currently  uses  30  Fortran  named  common 
blocks.  Except  for  /MEMORY/  and  /MSIZE/  the  user  should  not  de¬ 
fine  symbols  (subroutines  or  named  common  blocks)  that  conflict 
with  common  block  names  used  by  the  simulator.  These  common 
block  names  are  listed  below: 


ACTFLG 

QCODSS 

BRKCOM 

QCOM 

COM$F 

REG 

CONTRL 

REPORT 

CTABLE 

SETABL 

DECTBL 

STATE 

DEV 

SYMTB1 

DRVTBL 

SYMTB2 

INSTRX 

TRAPAR 

LIP 

TRKBLK 

LOAD 

UNITS 

MEMORY 

USAGE 

MSFLAG 

XCHANG 

MSIZE 

NEW 

TASKS 

$ 
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Appendix  D. 

Establishing  the  simulator  on  MTS 

In  addition  to  the  object  module  which  contains  the  Cray-1 
simulator,  three  additional  files  and  one  initialization  program 
are  part  of  the  simulator. 

The  initialization  program  (TABINIT)  process  the  instruction 
driver  table  used  by  the  simulator.  TABINIT  converts  the  driver 
table  from  a  character  format  to  an  internal  binary  format  which 
may  be  quickly  read  by  the  simulator  when  it  starts  up.  This 
program  is  only  needed  if  one  changes  the  driver  table. 

The  three  additional  files  are: 

1)  OPFILE  :  The  character  format  driver  table  used 

as  input  to  TABINIT.  (Not  directly 
necessary  to  use  the  simulator.) 

2)  TABLES . DAT :  The  binary  file  which  is  output  by 

TABINIT.  This  file  is  needed  to  run 
the  simulator . 

3)  HELP  :  This  file  contains  the  help  responses. 

It  is  not  essential  to  use  the  simulator 

If  TABLES . DAT  and  HELP  are  available  under  the  CCID  that  is 
running  the  simulator,  they  will  be  read  as  they  are  needed. 

Alternatively,  one  can  recompile  the  subroutine  OPFDUB  (open 
fdub) ,  after  modifying  the  CCID  defined  in  a  DATA  statement.  This 
CCID  should  point  to  an  alternate  MTS  catalog  where  TABLES.DAT  and 
HELP  can  be  found. 


APPENDIX  E 

CRAY-M  instructions  to  simulate 
shared  registers  and  semaphores  * 

In  developing  the  CRAY-M  simulator,  we  decided  that  some  means 
of  close  communication  between  the  processors  should  be  provided, 
we  therefore  added  eight  shared  T  registers,  eight  shared  B  registers 
and  sixty- four  semaphore  registers.  There  are  three  instructions 
for  manipulating  64  semaphore  registers,  two  instructions  for  the 
shared  T  registers  and  two  instructions  for  the  shared  T  registers 
and  two  instructions  for  the  shared  B  registers. 

To  avoid  conflict,  access  to  the  semaphores  and  shared 
registers  "rotates”  between  the  active  CPU's.  This  rotation  is 
based  on  the  3eal  Time  clock  register  when  timing  is  enabled,  and 
on  the  number  of  instructions  issued  when  timing  is  disabled.  It 
should  be  noted  that  this  difference  in  rotation  methods  may  cause 
different  results  in  tightly  coupled  algorithms. 

All  timings  and  protocol  (such  as  rotation  and  the  phasing 
of  shared  registers)  are  the  author's  choices  and  do  not  necessarily 
reflect  behavior  of  a  product  of  Cray  Research,  Inc. 


SMjk  0  Clear  semaphore  jk.  Semaphore  register  jk  is  set 
to  0.  Instruction  will  take  from  1  to  4  clocks  to 
complete . 

SMjk  1  Set  semaphore  jk.  Semaphore  register  jk  is  set  to 
1.  Instruction  will  take  from  1  to  4  clocks  to 
complete . 

SMjk  1,TS  Test  and  set  semaphore  jk.  If  semaphore  register 
jk  is  0,  set  it  to  1  and  continue.  If  semaphore 
register  jk  is  1,  hold  issue  on  this  instruction 
(i.e.,  until  a  different  cpu  sets  the  semaphore 
to  0 ) . 

SJ  STk  Enter  Sj  with  STk  (shared  T  register  k) .  This  in¬ 
struction  will  take  from  1  to  4  clocks  to  complete, 
'but  is  phased  to  execute  immediately  following  a 
semaphore  instruction. 


O  v*  V" 


/  ✓ 


Sk  Enter  STj  (shared  T  register  j)  with  Sk.  This  in¬ 
struction  will  take  from  1  to  4  clocks  to  complete, 
but  is  phased  to  execute  immediately  following  a 
semaphore  instruction. 

SBk  Enter  Aj  with  SBk  (shared  B  register  k) .  This  in¬ 
struction  will  take  from  1  to  4  clocks  to  complete, 
but  is  phased  to  execute  immediately  following  a 
semaphore  instruction 
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Cray-M  Simulator  Error  Stops 

This  appendix  discusses  possible  simulator  error  stops. 

These  error  stops  are  caused  by  internal  simulator  errors  that 
could  adversely  affect  simulation  results  if  simulation  were 
allowed  to  proceed. 

Some  error  stops  print  an  error  message  prior  to  halting, 
other  stops  only  indicate  a  stop  code.  The  list  of  error  stops 
below  is  separated  into  two  groups:  those  that  print  a  message 
and  those  that  indicate  a  stop  code.  The  subroutine  in  which  the 
stop  appears  is  also  noted  below. 

Error  Stops  with  Messages 


Subroutine 

GETCMD 

DECODE 

SIMBRK 

SIMBRK 

WIN  ST 


Stop  Message 


END  OF  FILE  ON  BATCH  INPUT  STREAM. 
INTERNAL  ERROR.  DECODE  TABLES  CLOBBERED. 
SIMBRK  CALLED  BUT  BRKSET  .LE.  ZERO. 

SIMBRK  CALLED  BUT  BREAKPOINT  NOT  IN  TABLE. 
INTERNAL  ERROR.  DECODE  TABLES  CLOBBERED.. 


Error  Stops  with  Stop  Codes 


Subroutine 


SMCTRL 

SMCTRL 

QPROC 

QWRITE 

QWRITE 

SETMSK 

BLDMSK 

DECODE 

WINST 


Stop 


invalid  bit  code  in  MSW. 
Invalid  argument  to  MSW. 
Unimplemented  action  code  used. 

m  n  n  n  rt 

Invalid  queue  action  code. 
Queue  space  exhausted, 
invalid  queue  pointer. 

Invalid  bit  code  in  SETMSK. 
Invalid  hold  issue  code.. 

Bad  instruction  format  code. 
Bad  instruction  format  code. 


SMCTRL 


Invalid  action  code  used. 


RESERV 

115 

Invalid 

reservation  code. 

ACTION 

116 

Action 

held  and 

Queue 

empty 

QWRITE 

118 

Invalid 

clock  period  argument. 

RESGOO 

200 

Invalid 

G-field 

dispatch  code. 

RESG01 

201 

It 

II 

II 

n 

• 

RESG02 

202 

II 

H 

ft 

ii 

• 

RESG03 

203 

it 

II 

II 

ii 

• 

RESG04 

204 

it 

It 

II 

it 

RESG05 

205 

it 

It 

N 

it 

* 

RESG06 

206 

it 

It 

It 

it 

• 

RESG07 

207 

it 

II 

It 

ti 

• 

RESG14 

214 

it 

If 

It 

ti 

RESG15 

215 

it 

N 

It 

it 

• 

RESG16 

216 

it 

ft 

It 

ti 

• 

RESG17 

217 

«« 

It 

It 

it 

• 

TRACK 

300 

Invalid 

track  command 

code. 

ENTRAP 

400 

Floating  point  interrupt  process  failure. 

ENTRAP 

401 

H 

II 

II 

It  It 

• 

ENATTN 

402 

Attention  process  failure. 

ENATTN 

403 

If 

It 

It 

RESG07 

1071 

Invalid 

J-field  dispatch  code. 

.■mem.’yyjj  y.  r,  -r;  ■<.  ^.  g.  ,  ■  r  -  ■■■  ■.•■-  ■.  .■*  •.-  r-  ^ 


Name: 

Language: 

Operating 

System 

Requirements : 


Availability : 


Appendix  G 

Program  Availability  Information 

Cray-M  Simulator 

IBM  Fortran- IV 

IBM  Assembly  Language 

The  only  system  subroutines  needed  are  those 
provided  by  the  stamdard  IBM  FORTRAN  IV  Sub¬ 
routine  Library  (e.g. ,  MAXO,  MINO,  etc.). 

All  I/O  is  done  via  FORTRAN  READ  and  WRITE 
statements  with  record  lengths  of  80  bytes 
or  less.  Hence,  the  simulator  should  run  on 
any  IBM- based  operating  system. 

Source  code  for  the  simulator  is  available  on 
9-track  EBCDIC  tapes.  Tapes  cam  be  made  ac¬ 
cording  to  amy  blocking  format,  can  be  labelled 
or  unlabelled,  and  cam  be  made  at  800,  1600,  or 
6250  BP I.  The  entire  Simulator/Cross-Assembler 
package  is  approximately  300,000  bytes  long. 
Contact : 

Professor  D.  A.  Calaban 
Dept,  of  Electrical  and  Computer  Engrg. 
University  of  Michigan 
Ann  Arbor,  MI  48109 
(313)  763-0036 


on  oonano  r»noonr»r»r»onnnnnnonno 


Sample  Simulator  Exit  Dispatcher 

STTFPOir’TNf  CPAYEY(IJX,  ASP,  SSP,  YSP,  VI,  EXSW) 
THPLTCTT  TNTEGE®  { A-*) 

LOGICAL  ETSO,  NOANS/.  TPOE.  / 

INTEGER  ASP(9),  7*^(2) 

?5AL*R  SSP(B),  Vtp(S4,9),  OS 
E00IVAL2NCE<PTC(1)  ,f) S) 


...  THIS  2TCT  PROCESSOR  IS  7SED  91  A  POLL  HATRIX  LO  FACT0RI2ATI0N 
RROGRAS.  THO  EXIT  PTTNCTirNS  APE  PFCVIBID: 

EX  1  *  INTTI A  LUES  ?BE  S00A3S  SATRIX  IN  CRAY-1  HEflORY. 

S 2GISTES  A1  POINTS  TO  THE  HATHIX. 

REGISTER  A  2  CONTAINS  THE  3ATFIX  SI2E. 

EX  2  -  PRINTS  00 T  THE  POP  TIRE  ANT)  THE  NATRTX  SI2E. 

REGISTER  A 7  CONTAINS  THE  NATRTX  SI2B. 

PEGISTES  S7  CONTAINS  THE  PE  A I  TINE  CLOCK  VALUE. 

OPTIONALLY  ,  I?  m  LOGICAL  V IRI  ABLE  *  NCANS  •  IS  .FALSE. 
THEN  THE  "2X  2"  PILL  ALSO  PRINT  THE  NATH IX  SOLOTION. 


CO  SHOO  SLOCK  POP  CSAY-1  TEE  OPT. 


T10TTBLE  PRECISION  NEE 
COfJNON  /NST2E/  NEESI2 

INmE5ER*2  HNEH  (1 ) 

INTEGER  THEH  (2,  4096) 

COHHON  /NENOPY/  E2E(4096) 

SOOIYALSNCE  (*EE(1 )  ,  IEEH(1,1),  HHE3(1)) 


.  ..  0TSPATCH  CN  THE  EXIT  CODE  (TJK) 

GO  TO (100 ,200) ,  IJK 
EX SO*. TP OE. 

SETTTRN 

C09E*1  TNITIALI2E  EATRIX  (A1->BASE,  A2*SI2E) 
N»ASP  f2»1) 

NADOR»ASP  (1+1) 

SEADOR  »  N ADOS 
00  140  J*1,N 
«*.T 

00  130  t»1,N 
N2N (NADDP  +1)  *  X 
NApnp.R^pnp^i 
ff»X-1 

IE  (K.LT.  ))  K*N 
CONTTNOE 
CONTINOE 
H1T0PN 


1  NO 
140 


1  ... 

C002*2  P9I2T  TOR  nv.  (117  *  MATRIX 

ST22,  S7  *  POM  TIB2) 

200 

OS  *  St® (7*1) 

WRIT*  (6,1000)  ASR(TM),  R  TCf  2) 

O 

1000 

20R!HT(")  5122*',  IS,  •  PTC*' ,17) 

e 

2TSW  *  .7  202. 

Tf  (HOMS)  PRYORS 

>3 

c 

*4’ 

00  250  1*1,2 

'.*iS 

290 

WRITS  (6,1 100)  (225  (  SM&ODR*  (J-  1)  *H+I 

)  ,J*1,W) 

1100 

20WHT(1X,  1027.3) 

g 

c 

tS 

c 

ROTOR !» 

e 

220 

B 

i 


?! 

§ 


no  noon  noooonnooooonoooonoonooooonoooooooooooooo 


...  THTS  M \1 N  °P0 G5M  CALLS  THE  C*AY-1  STMUIATCR  AS  A  SUBROUTINE 
V*  SOLVE  PARALLEL  SYSTEMS  OR  TEI- DIAGONAL  EQUATIONS.  UP  TO  64 
PARALLEL  SYSTEMS  MAY  87  SCLVED. 

THIS  PROGRA*  ’•TPYORMS  THE  ROLLOUT  MG  FUNCTIONS : 

1.  ®EADS  TVO  I  M°UT  °AR  A  METERS : 

MSTS  -  THE  NUMBER  OP  PARALLEL  SYSTEMS  TO  SOLVE. 

M ECS  -  "HE  NUMBER  OP  EQUATIC  KS  IN  EACH  SYSTEM. 

2.  ALLOCATES  THE  CRAY-1  MEMORY  PCF  THE  THREE  DIAGONALS  AMD  THE 
RIGHT  HARD  SIDE. 

3.  IRITT ALTERS  THE  SYSTEMS. 

4.  LOADS  THE  CR  AY-1  TRI-DIAGONAL  LU  DECOMPOSITTOR  ROUTT  RE  (TRIDEO 
ARP  INITIUIE^S  "HE  C1LLIN3  PARAMETERS  IM  AM  ADDRESS  LIST 
IMMEDIATELY  PSECEEDIRG  THE  LOADED  PROGRAM. 

5.  GIVES  COHTROL  TO  THP  USES  Y IA  THE  SIMULATOR  COMMAND  LANGUAGE, 
ALLOWING  THE  USER  TO  RUE  THE  PROGRAM,  SET  BREAK  POINTS,  BTC. 

6.  UPON  RETURN  PRO  R  THP  SIMULATOR,  PRINT  OUT  THE  SYSTEM  AND 
CALCULATE  THE  RPLOPS. 

7.  LOADS  THE  EACH- SUR5TITUTT0N  CRAY-1  PROGRAM  (TRISLY)  AND  INITIALIZES 
ITS  CALLING  PARAMETERS. 

8.  AGAIN  GIVES  CONTROL  TO  -"HE  USER  TC  RnN  THE  PROGRAM. 

9.  ^UPON  R"TU»N  PPOM  THE  SIMULATOB,  FRINT  OUT  THE  SYSTEM  AND 

CALCULATE  THE  MPLOPS. 


THIS  PROGRAM  *\M?S  "HE  PLACE  OP  THE  ST HULA TOR*  S  MAIN  PROGRAM  SINCE 
IT  IS  LCADEU  FIRST.  TT  ALSO  EXTENDS  THE  SIMULATOR  MEMORY  TO  8192 
WOPDS. 


COMMON  /PAF*S/  YSYS,  NEQS,  A8ASB,  3BASE,  CHASE,  YBASE 


. ..  *»•**  *OLLO WING  IS  AN  EXTENSION  OP  THE  CFAI-1  SIMULATED  MEMORY. 

DOUBLE  PRECISION  ME'* 

COMMON  /MEMORY/  MEMf9102) 

COMMON  /MSIZ’V  MEMNIZ 
TNTEGEP*2  HREM  (32*768) 

INTEGER  IMP  M  (2  ,  8 1 92 ) 

EOUTVALENC1  (MEM  <  1)  , IMEM  (  1 , 1)  ,  HHER  ( 1)  ) 


...  TELL  THE  SIMULATOR  THE  MEW  SIZE  OP  CRAY-1  MEMORY. 
HPIM5IZ  »  8192 

CALL  CtA1 1  (»INIT» .TRUE.) 

115 


M  "JlJ 


on-*  rjr>  nnnn  o  o  o  o  -»  n.ioann 


•  •  • 


II  -  THE  I  DI  wr,OTTOV  INC?  ?  SENT . 

THE  DISTANCE  3ET  SEEM  DIAGONAL  ELEMENTS. 
.  ..  IJ  -  TTTE  J  DIRECTION  INCREMENT. 

TEE  DISTANCE  BETWEEN  P> E ALLEL  ELEMENTS. 


rr  =  i 

READdS,  1000)  NSTS,  N  EQS 
090  *0*>M\T(T5/IS) 

17  (NSTS  .  GT.  6»)  GO  TO  <510 

t.T  =  NEQS 

...  SET  ARE  AT  EASES. 

C3ASE  a  300 

ABASE  =  Cf»SE  *  NSTS*NEQS 
RSASE  3  ABASE  *  NS7S*  NEOS 
TBASE  *  EEASE  ♦  *1  STS* N EQS 

17  (TB  ASE  +  N*K  .  GT.  -MEMSIZ)  GO  TO  900 

...  INITIALIZE  TETDEC  CRAT  MEMORT  WITH  THE  TRI-DIAGONAL  DA^A 


. ..  LOOP  THR1T  THE  ELEMENTS  "7  A  SYSTEM  TO  INITIALIZE. 
DO  10  I  s  1,NEQS 


0 


LOOP  •’•HHTJ  ALL  PARALLEL  SYSTEMS. 


DO  10  J  *  1.NSTS 
MEM  (CD  A  ?E*  (J-1)  *IJ  ♦ 
MEM  (ABASE*  (J-1)  *IJ  * 
MEM  (DBASE*  (J-1)  *IJ  * 
MEM  (TBASE*  (J-1)  *IJ  ♦ 
CONTINHE 


(1-1))  »  J.1E0 
(1-1))  3  1 .  JD  0 
(T-  1) )  *  1/10.  JD0 
(1-1))  »  1*1. 0D0 


...  CLEAR  OOT  THE  TOP  07  C  AND  THE  BOTTOM  OP  B. 
DO  29  J  »  1  fV  STS 

HEM  (CHASE*  (J-1)  *TJ  *  (1-1))  =0.000 
M*M  (B3AS7*  (J-1)  *IJ  +  (NEOS-1))  *  C.ODO 

20  CONTINUE 
C 

RET^Eda,  1200) 

1200  FORMAT  (dCBAT-1  TRI-DIAGONAL  SOLVER*) 

C 


CALL  CHECK  (J,  .FALSE.) 


c 

C  ...  SET  0®  THE  APGONSNT  CALL  R LOCK  WITH  ECINTHRS  TO 
C  THE  A BOON ENTS. 

C 

C  T.OC  1)0  a  NSTS,  LOC  1)1  =  NEOS,  T.OC  102  *  II,  LOC  103  =  IJ, 

C  LOC  1)4  =  CLOCK .  (TT’RO)  *  CP  AY- 1  SCDFESS  7EF0.) 

C 

I*?"  (2,  9+  1)  a  68 
I*?*  (2,  10  +1)  *  67 
THEH(2,11*1)  =  66 
THE1*  (2, 12*1)  =  SR  AS 2-  1 
I***  (2,  13*1)  *  A  3  ASH- 1 
THEN  (2, 14*1)  *  CRASH 
THE*  (2, 15*1)  »  65 
I"EH f2, 16*1)  *  64 
C 

C  ...  SET  HP  ARGHfEHT  LOCATIONS. 

THH*(2,68*1)  »  R 
T*EN(2,67*1)  »  IJ 
INHH  (2,66*1)  3  IT 
I.NEH  (2,65*  1)  =  NHQS 
IHHN  (2,64*  1)  3  NSTS 
C 
C 

C  ...  LOAD  TSTDEC  AND  STYE  SIHOT.ATOR  CONTROL  TO  THE  TISER. 

CALL  CHAT1  ('CON  A  F‘*EP  T*IPEC  LOADS,  PON  21A  TO  START.!',  .  FALSE. ) 
CALL  CHAT  1  ('LCAO  SOTO :T8T DEC; OSE  *«SCOBCE»!',  .FALSE.  ) 

C 

C 

HOPS  *  NSYS  *  (1  *  (NEOS- 1 )  *4) 

CALL  CHECH  (NO  PS,  .  TRTTI.) 

C 

C  ...  INITIALIZE  TRISLY'S  A  BOHN  ENT  BLOCK  WITH  ITS  POINTERS. 

C 

T*EH  (2,10*1)  »  58 
r»»H(2.11*1)  3YBASH-1 
THE*  (2,12*1)  *  67 
THEN  (2,  13*1)  *  66 
THEM (2,  14*1)  »  BRASS  -  1 
IRES  (2,15*  1)  3  ARASE-1 
IHEH  (2,16*1)  3  C3ASE 
IHHH  (2,17*1)  3  65 
INSS(2,  19*1'  ■  64 
C 
C 

C  ...  LOAD  TPISLY  AND  5IYH  STSOLATOR  CONTFCI  TC  OSSR. 

CALL  CRAT1  ('CCS  ANTES  TSTSLY  LOADS,  RON  23A  TO  START.!',  .  FALSE.) 
CALL  CRATI^'LOAD  SOT!  :  TPI  SLY;  OSE  *«SCOPCE*!',  .FALSE.) 

C 

HOPS  «  NSTS  *  ( (NEOS-  1)  *2  ♦  HHOS*3) 

CALI  CHECK  (NOPS,  .TPOE.) 

C 

C 

STOP 


117 


c 

c  ...  sot  *somgh  »rw;r  ?"*>  the  problem  site. 

900  M\*®*3  *  (*E*STZ  -  C3AST)/4 

SPITE  (6  ,  9000)  MAXPB3 

9000  *OPM\‘"(»9<r»AT-1  MEMORY  *',00  SMALL  PCF  THIS  PROBLEM  SIZE.'/ 

♦  '  TEE  LASS EST  PSCDMCT  OP  SSTS*NEQS  MOST  3E  <  *,15) 
STOP 

r 

C  ...  TO  MAST  PA»ALL*L  S*STT!,,*S. 

9  10  SPITE  (6,9  0  101  MSYS 

9010  PO®MAT(*OTHE  MOMBE?  n?  PAPALLEL  SYSTEMS  MAY  NCT  EXCEED  64.'/ 

♦  *  ',15,'  SAS  SPECI FIEO. ') 

STOP 

C 

C 

BSD 


SOBPOOTTME  CHECK (MOPS,  PTTME) 

IMPLICIT  INTSGZ?  (A-71 

CO M  MO  M  /PAFMS/  SSYS,  MEQS,  ABASE,  BBASE,  C3ASB,  YBASE 
C 

C  CO  MEOW  BLOCK  PCS  CRAY-1  MEMOHY. 

C 

DOOBLE  PRICTSTOS  ME* 

COSMOS  /MEMOS Y/  M*M(9192) 

COM*OM  /MSTZE/  MEMSTZ 
r»TS(SER*2  HMEH(3276B) 

ISTESEB  IHEM(2,  9192) 

EO 017 ALES CE  (MEM  ( 1)  ,TMEH  (1  ,1)  ,HE2M(  1)  ) 

C 

SEAL  MPLOPS 
LOGICAL  PTTME 
C 
C 

C  ...  CALC  *»LOES 
C 

ETC  »  IMEM  (2,  68*  1 ) 

•FLOPS  «  0.0 

TP  (ETC.  ME.  0)  MPLOPS  «  (HOPS  *  BO.  0) /ETC 
C 

C  ...  PEI ST  TEE  PESOLTS 
SPITE  (19,  4000) 

DO  40  I  »  1,  SEQS 

40  SPITE (14, 5000)  I  ,  MEM  (CBA  S?»I- 1)  ,  t , SEE (ABASE *1-1) , 

♦  I,HEB  (FSASf*I-1)  ,  I, MEM  (Y3ASE*I-1) 

IE  (PTTME)  f*ITS(  1«,60  00)  ETC,  S3TS,  SEQS,  MPLOPS 

C 

4030  FORMAT  (»-') 

C 

5000  MOPMAT  (  •  C(»,T2,M«»,E13.4,  •  A (•  ,12, » ) ■*  , B13.  4 , 

♦  •  B  ( *  ,12  ,  *)  *  '  ,51  3.4,  •  Y  ( *  ,12  ,’)*',  E13.  4) 

C 

6000  POwMAT (*  PTC  *»,I7  ,»  •  SYSTEMS  «  *,I3,  '  SIZE  OP  SYSTEM  » 

♦  13  ,  ' . MPLOPS  PI  2.3, /»  •) 

C 

PE^OPS 

BED 


Appendix  j 
Load  Module  Formats 

1.  Relocatable  Modules 

Relocatable  modules  consist  of  seven  types  of  binary  records.  An  IDEM 
record,  one  or  more  TXT  records,  zero  or  more  RLD,  EXT,  ENTR,  and 
SYM  records,  and  an  END  record. 

An  IDEM  record  identifies  the  name  of  the  module.  The  record 
consists  of  the  characters  IDEM,  followed  by  4  spaces,  followed  by 
the  8  character  name  of  the  module. 

A  TXT  record  contains  the  actual  object  code  to  be  loaded,  it 
consists  of  the  letters  TXT,  followed  by  five  spaces,  followed  by  a 
foul:  byte  binary  address  of  this  portion  of  the  module  (relative  to 
the  top  of  the  module) ,  followed  by  a  four  byte  binary  length.  The 
actual  text  to  be  loaded  is  on  the  following  card. 

An  RLD  record  identifies  the  locations  in  the  module  which  must 
be  relocated.  It  consists  of  the  letters  .JD,  followed  by  five  spaces 
followed  by  one  or  more  8  byte  fields.  The  first  4  bytes  of  the  field 
contain  the  binary  address  (relative  to  the  top  of  the  module)  of  the 
text  to  be  relocated.  The  second  4  bytes  contain  a  ^•■<aber  describing 
the  type  of  relocation  to  be  performed.  See  RLD  &  EXT  types,  below. 

An  EXT  record  identifies  the  locations  in  the  module  which  refer 
to  external  locations.  It  consists  of  the  letters  EXT,  followed  by  5 
spaces,  followed  by  one  or  more  16  byte  fields.  The  first  8  bytes  of 
the  field  contains  the  8  character  name  of  the  external  location  re¬ 
ferenced.  The  next  four  bytes  contain  the  binary  address  (relative  to 
the  top  of  the  module)  of  the  text  referencing  the  external.  The  last 
4  bytes  contain  a  number  describing  the  type  of  reference.  See  RLD  & 
EXT  types,  below. 

An  ENTR  record  identifies  entry  points  in  the  module.  It  consists 
of  the  letters  ENTR,  followed  by  4  spaces,  followed  by  one  or  more  12 
byte  fields.  The  first  8  bytes  of  the  field  contain  the  name  of  the 
entry  point,  and  the  last  4  bytes  contain  the  address  (relative  to 
the  top  of  the  module)  of  the  entry  point. 
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APPENDIX  K 

Task  Definition  File  Description 

By  issuing  the  command  "SET  TASK=ON"  from  the  simulator  com¬ 
mand  language,  task  timing  is  enabled.  The  simulator  immediately 
prompts  for  an  input  file  defining  the  task  locations  in  simulator 
memory. 

The  structure  of  the  input  file  is  as  follows: 

<TACT  Report  Header,  1  to  40  characters > 

<Clock  skip>  <Pagination  flag>  Compression  flag> 

<Task  name>  <Task  entry  point>  <Task  exit  point> 

<TACT  Report  Header>  is  a  title  which  appears  at  the  top  of  every 
TACT  Report  page.  The  title  can  be  up  to  40  characters  in  length. 

dock  skip>  is  the  number  of  clocks  to  skip  between  records  in 
the  tact  report. 

Pagination  flag>  is  set  to  1  if  pagination  of  the  TACT  report  is 
desired  (line  printer) ,  0  if  pagination  is  not  desired  (terminal) . 

Compression  flag>  is  set  to  1  if  tact  Report  compression  is  desired, 
0  if  multiple  identical  records  are  desired. 

<Task  name>  is  a  6  character  identifier  for  the  task,  to  be  printed 
on  the  TACT  report. 

Cask  entry  point>  is  an  address  or  label  defining  the  starting  point 
of  the  task. 

Cask  exit  point>  is  an  address  or  label  defining  the  ending  point  of 
the  task. 

The  last  record  can  be  repeated  up  to  30  times  so  that  up  to  30 
tasks  can  be  defined  at  one  time.  A  task  can  begin  on  the  same  point 
that  another  task  ends,  but  tasks  can  not  overlap  in  memory. 


> 

1 

8X8  SPARSE 

> 

2 

100  0 

> 

3 

FAC1 

B1 

B2 

> 

4 

J0IN1 

B14 

BIS 

> 

5 

MUL2 

B31 

B32 

> 

7 

JQIN4 

SI  60 

B18 

> 

8 

JOINS 

B16 

B17 

> 

9 

SOLI 

B7 

B8 

> 

10 

MUL1 

B4 

B5 

> 

11 

MUL 

365A 

421 A 

> 

12 

SOL 

270A 

340A 

> 

13 

FAC 

425A 

500C 

#End 

of 

file 

Listing  of  Sample  Task  Definition  File 
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APPENDIX  L 

Example  Use  of  Simulator  and  Cross  Assembler 

The  following  pages  show  a  sample  terminal  session  in  which  the 
Cross  Assembler  and  Simulator  package  is  used  to  assemble  and  execute 
a  simple  CAL  code  using  a  Fortran  driver. 

The  first  part  of  the  Fortran  driver  is  the  common  block  contain¬ 
ing  the  simulated  CRAY  memory  (see  section  2.3.2).  The  next  portion 
initializes  the  simulator  and  loads  the  cross  assembled  object  module. 
Next,  the  values  to  be  squared  are  loaded  into  the  simulator  memory 
at  address  200  octal  for  a  vector  length  of  100  octal  (MEM  array 
subscripts  129  through  192).  After  the  object  code  and  operands  have 
been  loaded,  all  that  remains  is  to  run  the  simulation  and  retrieve 
the  results  from  simulator  memory,  which  the  next  two  sections  of  the 
driver  perform.  The  results,  of  course,  are  the  squares  of  the  first 
64  integers,  as  we  expected. 


Cl 


#*LIST  FTN. 


FORTRAN  DRIVER  FOR  VECTOR  SQUARE 


COMMON  BLOCK  FOR  CRAV-i  MEMORY'. 

DOUBLE  PRECISION  MEM 
COMMON  /MSIZE/  MEMSIZ 

INTEGER*^  HMEM(l) 

INTEGER  I MEM (2, 4096) 

COMMON  /MEMORY/  MEM (4096) 

EQU I  VALENCE  (MENU)  ,  I  MEM  <  1 ,  1 )  ,  HMEM  ( 1 )  ) 


*  23 

$  24 

*?  2= 

B26 

27 

23 

29 

H  30 

32 

5  33 

34 
35 

-a  36 

Sf#*RUN  *FTN 

^fcEwecution 
No  trror» 
ffl#E>:ecution 
Sl#SLIST  CAL. 


C  ...  INITIALIZE 

CALL  CRAY1  INI Tj  LOAD  CAL.  0;  RETURN  !%.  TRUE.  ) 

C 

C  ...  SET  UP  VECTOR  TO  SQUARE 
DO  10  1*1, 64 

MEM (123+1)  *  l.ODO  *  I 
10  CONTINUE 
C 

C  ...  RUN  THE  CAL  CODE 

CALL  CRAYirCH  P  SQUARE 5  SET  T I M=OFF;  RUN!  RETURN  !  ’  ,  .  TRUE 
C 

C  ...  GET  RESULT 

WRITE <6,100)  l  MEM ( 128+1 ) , 1*1,64) 
lOO  FORMAT  <  '  •’  ,  4F10.  2) 


STOP 

END 

3CARDS*FTN.  TEST  SPUNCH—O 
begins  13:12:06 
in  MAIN 

terminated  13:12:07  T*0.039  *0.02 

SQUARE 

I DENT  SQUARE 
BASE  0 
ABS 

ORG  20 


SQUARE 


A1  100 

VL  A1 

AO  200 

VI  ,AO,A 

V2  V1*FV 

.AO. AO  V2 


100  SET  VECTOR  LENGTH  TO  64 

A1 

200  LOAD  VECTOR  TO  SQL 

, AO, AO 

V1*FV1  SQUARE  THE  VECTOR 

V2  STORE  THE  RESULT 


LOAD  VECTOR  TO  SQUARE 


I muqMScs  ■-= 


1  *  -  -v  ■ 


v  .  r-  i  ■ 


1 1 


#*RUN  SFOl:CAL  3CARL’S»CAL.  SQUARE  SPUNCH=CAL.O  SPRiNT=**-0U!-1Mv  * 

#E>:ecution  begin*  15: 12:  .9 

^Execution  terminated  15:  12:  lu  1=0.056  *v.04 

#$RUN  — Q+K350  :  i  !P  .  4 
#E>:ecution  begin*  15:12:13 
iNir 

_LQAD  CAL. 0  _ 

RETURN 
CH  P  SQUARE 

SQUARE  DEFINITION  USED  FROM  iDENT  SQUARE 
SET  TIH-QFF 


4 


A 


RUN 

EXIT  0  AT 

22A 

cpu  *  i 

RETURN 

1.00 

4.  00 

9.  00 

1 6  .  00 

25.00 

36.00 

49.00 

64.00 

ai.oo 

100.00 

121.00 

144. 00 

169.00 

196.00 

225.00 

25o - Ou 

289.00 

324.00 

361.00 

400 . 00 

441.00 

484.00 

529.00 

576. 00 

625.00 

676.00 

729.00 

784.00 

341.00 

900.00 

96 1 . 00 

1024.00 

1089.00 

1156.00 

1225.00 

1296.00 

1369.00 

1444.00 

1521.00 

1600.00 

1681.00 

1764.00 

1849.00 

1936.00 

2025.00 

2116.00 

2209.00 

2304. 00 

2401.00 

2500.00 

2601.00 

2704.00 

2809.00 

2916.00 

3025.00 

3136.00 

3249.00 

3364.00 

3481.00 

3600.00 

3721.00 

3844.00 

3969.00 

4096.00 

#E>:ecution  terminated 

15: 12: 16 

T-0.055 
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